Category: AI Tutorials

I hacked LTX2 to be used as a Multi Lingual TTS voice cloner

Took me a bit but I figured it out. The idea is to geneate a very low resolution (64×64) video with input audio and mask the audio latent space after some time using “LTXV Set Audio Video Mask By Time”. So the audio identity is set up in the first 10 seconds and then the prompt continues the speech.

The initial voice is preserved this way. and at the end you just cut the first 10 seconds. It works with a 20 seconds audio sample of the voice and can get 10 clean seconds. Trying to go beyond that you run into problems but the good thing is you can get much better emotions by prompting smething like “he screams in perfect romanian language” or whatever emotions you want to add. No other open source model knows so many languages and for my needs, romanian, it works like a charm. Even better then elevenlabs I would say. Who would have known the best open source TTS model is a Video model ?Workflow is here

snails thumbnail

Snails !!!

Made in ComfyUI using only local models (and uncensored of course).
workflow and usage is the one on this post, with reference actors that seem to work quite well or this direct link to the workflow .
embedded workflows and prompts in each asset (basically the whole related output folder from comfy).
Models used :
LTX 2.5 for Video and 2 shots with WAN 2.2 (explossions ones, at this LTX sux)
Flux Klein for images,
IndexTTS for voice,
Audio Ace Step 1.5 for music


LTX-2.3 Long Video For Low VRAM/RAM Workflow

LTX-2.3 is now out with better coherence both temporal and spatial. So I gave it a spin. With a hard sure-to-fail scenario. A very long continous action scene with actor and environment referencing. I know the characters change faces during the video but this is my fault as I updated the characters during the creation of the video.

Read More
wan22 ltx upscaler refiner external reference actors 2

WAN 2.2 + external actors > LTX-2 upscaler/refiner/actor reinforcement in ComfyUI

In my previous posts I talked about how you can use LTX-2 as an WAN upscaler/refiner and how to add external actors and elements references without img2vid (you need an empty scene without them and need them to come into the scene).
But why not both ? LTX-2 sux in action sequences and human interactions so the alternative at this point is wan 2.2 . But wan is lowres and has the same issue as ltx, no way for now to add actors in latent space.
So I used the same technique as for LTX2 to add actors to wan and then reinforce them in LTX-2 using the same method. Here are some results:


Idea :
Generate a very low res wan 2.2 video as reference for LTX but still pre-appending the actors and elements images at the beginning of the video,, then have the first image from the actual shot and referencing the characters from the beginning in the video. This step at 480P is very fast and good enough for characters interaction/movement coherence etc to be used as vid2vid in ltx-2. We save it at 12 fps so we can upscale with temporal upscaler in ltx.
Then in the LTX step we bring the same intro images but at highest resolution possible so ltx knows how the characters actually look like in maximum detail and paints them over the lowres wan video at at a 4x resolution. So the 480p video becomes 1440p in this case (but you can go lower if you don’t have the resources, I have an 3090 and 64GB system ram).
Both qwen image edit and flux klein were used for generating the actors, scene, zoom ins on the scene, removing characters etc.

Read More
comfyui ltx outside actors

LTX-2: Adding outside actors and elements to the scene (not existing in the first image) IMG2VID workflow.

This for me was the biggest problem with LTX-2, the inability to add characters from outside the camera without training a lora. So I finally managed to get something working (workflow).
please check out the other article where I expanded to wan2.2 and used ltx on top. much better for some cases like character interaction and action where ltx is a mess.

Read More
AI VS PHOTO 08

AI VS My Real Photos

After I made my full photo archive available for free sume reddit users that I thank like NobodyButMeow created a Qwen Image Lora after my photos. What stroke me was that using the initial caption text the photos resemble the original a lot, as you can se bellow.
I have to mention that I am also using a WAN 2.2 refiner like in the workflow here .
The LORA is available here, no triggerwords needed.
Here is a sample prompt for the second image :
“A landscape at sunset, featuring a prominent, conical mountain in the foreground. The mountain is covered with snow, and its peak is illuminated by the setting sun, casting a warm, golden glow across the scene. The sky is filled with dramatic clouds, adding depth and texture to the composition. In the foreground, there is a small waterfall cascading over a rocky surface, partially covered in ice and snow. The water appears to be flowing gently, creating a sense of tranquility. The background reveals a vast, open landscape with more mountains and a body of water reflecting the sunset colors.”



Read More
chroma 03

Getting good results out of Chroma Radiance

A lot of people asked how they could get results like mine using chroma Radiance.
In short you cannot get good results out of the box. You need a good negative prompt like the one I set up and use technical terms in the main prompt like: point lighting, volumetric light, dof, vignette, surface shading, blue and orange colors etc. You don’t neet very long prompts and it tends to lose itself when doing so. It is based on Flux so prompting is closer to flux.
And the most important thing is the wan 2.2 refiner that is also in the workflow. Play around with the denoising, I am using between 0.15 and 0.25 but never ever more, usually 0.20. This also get rids of the grid pattern that is so visible in Chroma radiance and wrong hands and fingers.
The model is very good for “fever dreams” kind of images, abstract, combining materials and elements into something new, playing around with new visual ideas. In a way like SD 1.5 models are.
It is also very hit and miss. While using the same seed allows for tuning the prompt keeping the same rest of the composition and subjects changing the seed radically changes the result so you need to have pacience with it. Imho the results are worth it. Also sometimes you need to correct things in photoshop using generative fill.
The workflow I am using is here .
Here is a small gallery :

Read More
SD15 21

WAN 2.2 Upscaler/Refiner

This is the refiner/upscaler I am using for most of my images. It uses the realism and details of wan 2.2 video model but for images to polish images from qwen/Chroma/SD1.4/SDXL/Flux etc.

The workflow is here

Read More
Datasetcaptioning

Dataset Generator and Auto Captioning using Qwen

Because somebody on Reddit asked how could he caption a dataset for Qwen Image and mentain consistancy I made a small ComfyUI workflow that uses Qwen 2.5 VL 7B Instruct to autocaption the images in a folder, name them, caption them and save them all in another folder. It should be straightforward to use but you will have to manage the missing nodes and models yourself

The workflow is here .

Read More
Midjourneyfier 63

Wan 2.2 Lightning LORA 3 Steps in total workflow

This video was created using a 3 steps total workflow in 720p, around 25% faster than normal 2 steps per model workflow. The idea was that since the first high noise model is high noise anyways, there may be a configuration of parameters that only needs 1 step for it. It seems to work but I have mention that from time to time the image gets blurry and have not tested it other with images from THESE series witch have a very particular style and the motion is very still.

Here is the workflow, you can try it yourself. Note that you need the lighting loras and the Wan2.2 I2V Models (I am using GGUF versions). Any missing nodes should be downloaded using the manager.
https://aurelm.com/upload/ComfyWorkflows/Wan_22_IMG2VID_3_STEPS_TOTAL.json
Here are 2 videos made with this trick:

Read More
Qwen Randomizer 04

Behold, the Qwen Image Deconsistencynator !!!! (Or randomizer & Midjourneyfier)

Qwen image has been getting a lot of unjustified heat for something wonderful (consistancy when updating prompts). Now I understand why some people want that random factor, finding the perfect shot by just hitting generate so I made this custom workflow that uses Qwen24VL3BInstruct to generate variations of the initial prompt, improving it and simulating the “old ways” of doing things.
This uses Qwen Image Edit as base model for generating image, but the initial prompt tweaking nodes in the left can be copy pasted to any workflow. Link bellow and samples + youtube tutorial:

Workflow for getting Midjorney like images
Version 2 (with Borealism LORA)
Workflow for SRPO Refiner
Edit: Changend the workflow and updated with better prompt generation. There is now a midjourneyfier boolean at the beginning of the left group so you can either diversify the prompt like the first example with the wires below or midjourneyfi the hell out of it like the later photos.

Read More
UPSCALE THUMBNAIL

Proper photo AI upscaling in 2025

As a photographer and AI enthusiast and technical artist I experimented everything possible long before midjourney and such came along (back in the days of discu diffusion in 2021). So I got a head start. Of course as a photographer I was very interested on how to use it for photography, and upscaling in this case. All commercial tools at this point suck for me.
But working with open source tools like comfyUI I finally managed to get something incredible. Example here : (first is 100% crop of tohe eye in the last image, second is upscaled version of the eye and then zoom out to original).

Read More