feat(img2vid,txt2vid): Initial support for img2vid,txt2vid (#1442)

* feat(img2vid): Initial support for img2vid * doc(SD): fix SDXL Example * Minor fixups for img2vid * docs(img2img): fix example curl call * feat(txt2vid): initial support Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> * diffusers: be retro-compatible with CUDA settings * docs(img2vid, txt2vid): examples * Add notice on docs --------- Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
2025-05-20 10:35:01 +00:00 · 2023-12-15 18:06:20 -05:00 · 2023-12-15 18:06:20 -05:00 · dd982acf2c
commit dd982acf2c
parent fb6a5bc620
7 changed files with 150 additions and 27 deletions
--- a/docs/content/features/image-generation.md
+++ b/docs/content/features/image-generation.md
@ -147,7 +147,6 @@ backend: diffusers
 # Force CPU usage - set to true for GPU
 f16: false
 diffusers:
-  pipeline_type: StableDiffusionXLPipeline
  cuda: false # Enable for GPU usage (CUDA)
  scheduler_type: euler_a
 ```
--- a/docs/content/howtos/easy-setup-sd.md
+++ b/docs/content/howtos/easy-setup-sd.md
@ -15,7 +15,6 @@ backend: diffusers
 # Force CPU usage - set to true for GPU
 f16: false
 diffusers:
-  pipeline_type: StableDiffusionXLPipeline
  cuda: false # Enable for GPU usage (CUDA)
  scheduler_type: dpm_2_a
 ```
--- a/docs/content/model-compatibility/diffusers.md
+++ b/docs/content/model-compatibility/diffusers.md
@ -27,12 +27,9 @@ name: animagine-xl
 parameters:
  model: Linaqruf/animagine-xl
 backend: diffusers
-
-# Force CPU usage - set to true for GPU
-f16: false
+cuda: true
+f16: true
 diffusers:
-  pipeline_type: StableDiffusionXLPipeline
-  cuda: false # Enable for GPU usage (CUDA)
  scheduler_type: euler_a
 ```

@ -47,9 +44,9 @@ parameters:
 backend: diffusers
 step: 30
 f16: true
+cuda: true
 diffusers:
  pipeline_type: StableDiffusionPipeline
-  cuda: true
  enable_parameters: "negative_prompt,num_inference_steps,clip_skip"
  scheduler_type: "k_dpmpp_sde"
  cfg_scale: 8
@ -69,7 +66,7 @@ The following parameters are available in the configuration file:
 | `scheduler_type` | Scheduler type | `k_dpp_sde` |
 | `cfg_scale` | Configuration scale | `8` |
 | `clip_skip` | Clip skip | None |
-| `pipeline_type` | Pipeline type | `StableDiffusionPipeline` |
+| `pipeline_type` | Pipeline type | `AutoPipelineForText2Image` |

 There are available several types of schedulers:

@ -131,17 +128,16 @@ parameters:
  model: nitrosocke/Ghibli-Diffusion
 backend: diffusers
 step: 25
-
+cuda: true
 f16: true
 diffusers:
  pipeline_type: StableDiffusionImg2ImgPipeline
-  cuda: true
  enable_parameters: "negative_prompt,num_inference_steps,image"
 ```

 ```bash
 IMAGE_PATH=/path/to/your/image
-(echo -n '{"image": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
+(echo -n '{"file": "'; base64 $IMAGE_PATH; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-edit"}') |
 curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
 ```

@ -157,14 +153,67 @@ backend: diffusers
 step: 50
 # Force CPU usage
 f16: true
+cuda: true
 diffusers:
  pipeline_type: StableDiffusionDepth2ImgPipeline
-  cuda: true
  enable_parameters: "negative_prompt,num_inference_steps,image"
  cfg_scale: 6
 ```

 ```bash
-(echo -n '{"image": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
+(echo -n '{"file": "'; base64 ~/path/to/image.jpeg; echo '", "prompt": "a sky background","size": "512x512","model":"stablediffusion-depth"}') |
 curl -H "Content-Type: application/json" -d @-  http://localhost:8080/v1/images/generations
 ```
+
+## img2vid
+
+{{% notice note %}}
+
+Experimental and available only on master builds. See: https://github.com/mudler/LocalAI/pull/1442
+
+{{% /notice %}}
+
+```yaml
+name: img2vid
+parameters:
+  model: stabilityai/stable-video-diffusion-img2vid
+backend: diffusers
+step: 25
+# Force CPU usage
+f16: true
+cuda: true
+diffusers:
+  pipeline_type: StableVideoDiffusionPipeline
+```
+
+```bash
+(echo -n '{"file": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true","size": "512x512","model":"img2vid"}') |
+curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
+```
+
+## txt2vid
+
+{{% notice note %}}
+
+Experimental and available only on master builds. See: https://github.com/mudler/LocalAI/pull/1442
+
+{{% /notice %}}
+
+```yaml
+name: txt2vid
+parameters:
+  model: damo-vilab/text-to-video-ms-1.7b
+backend: diffusers
+step: 25
+# Force CPU usage
+f16: true
+cuda: true
+diffusers:
+  pipeline_type: VideoDiffusionPipeline
+  cuda: true
+```
+
+```bash
+(echo -n '{"prompt": "spiderman surfing","size": "512x512","model":"txt2vid"}') |
+curl -H "Content-Type: application/json" -X POST -d @- http://localhost:8080/v1/images/generations
+```