Krea 2 (and probably other image models): Infinite Character Consistency Using Descriptions + ComfyUI

Ok, so long story short, I saw a few workflows that enabled Krea 2 inside ComfyUI to generate a single image with four panels in order to keep character consistency. Nice idea, but for me there was one problem: resolution. Four panels inside one image is a no-go for the kind of work I do.

Then I had a different idea.

What if Krea is simply good enough that, with sufficiently detailed descriptions, it doesn’t need previous images at all? What if every scene completely recreates the characters from scratch?

So I tested it.

It works surprisingly well.

The characters remain remarkably consistent across an essentially unlimited number of separately generated images.

The workflow is at the end of the article.

The trick is surprisingly simple.

Instead of writing one prompt for one image, write one prompt describing an entire story.

Split it into:

SCENE 1
...

SCENE 2
...

SCENE 3
...

The first scenes are detailed character sheets.

The following scenes repeat the complete description of every character every single time they appear.

Never write things like:

  • same detective
  • same woman
  • same fox
  • same as before

Every image must be completely self-contained.

That means the image model never needs to remember anything.

The consistency comes entirely from language.

The descriptions should be extremely detailed, especially for the character sheets. I originally built this for video generation, where close-ups need good identity references, but it works surprisingly well for image generation too.

Qwen VLM inside ComfyUI, together with a simple loop, separates the master prompt into one prompt per scene and generates one image for each of them.

There is eventually going to be a context-window limit if the story becomes huge, but I haven’t managed to reach it yet.

Here is how the scenes prompt looks like:

SCENE 1

Character sheet, photorealistic studio reference, neutral white seamless background, soft even studio lighting, full-body front view on the left and large close-up portrait on the right.

A female investigative journalist, 31 years old, 173 cm tall, athletic feminine build, fair skin with a warm undertone, faint freckles across the bridge of her nose and upper cheeks, oval face, pronounced cheekbones, soft jawline, shoulder-length naturally wavy dark auburn hair with loose curls, center part, vivid green almond-shaped eyes, thick natural eyebrows, straight elegant nose, full natural lips with muted rose color, calm intelligent expression. She wears a fitted black leather jacket with silver zipper, plain white crew-neck T-shirt, dark indigo slim-fit jeans, brown leather belt with brushed steel buckle, dark brown leather ankle boots, thin silver necklace, small silver hoop earrings, silver ring on her right index finger. Arms relaxed at her sides, standing naturally, looking directly at the camera.

SCENE 2

Character sheet, photorealistic studio reference, neutral white seamless background, soft even studio lighting, full-body front view on the left and large close-up portrait on the right.

A private detective, 45 years old, 186 cm tall, broad athletic build, light olive skin, slightly receding short salt-and-pepper hair, neatly trimmed salt-and-pepper beard, warm brown eyes behind thin rectangular matte-black glasses, thick eyebrows, square jaw, straight nose, composed observant expression. He wears a beige trench coat reaching below the knees, crisp white dress shirt, burgundy silk tie, charcoal wool trousers, polished black leather Oxford shoes, black leather gloves, brown felt fedora with dark ribbon, stainless steel wristwatch partially visible beneath his sleeve. Standing naturally with hands relaxed, looking directly into the camera.

SCENE 3

Character sheet, photorealistic studio reference, neutral white seamless background, soft even studio lighting, full-body side and front reference on the left, large facial close-up on the right.

An adult Eurasian lynx with thick silver-gray fur covered in dark black spots, pale cream chest and underbelly, large snowshoe paws, muscular body, short black-tipped tail, long black ear tufts, amber eyes, prominent whiskers, broad feline face, alert intelligent expression, dense layered fur with individually visible hairs, standing naturally while looking directly toward the camera.

SCENE 4

A rainy 1980s New York street at night filled with colorful neon signs, wet asphalt reflecting pink, blue and orange lights, vintage yellow taxi driving past, steam rising from street vents, distant pedestrians holding umbrellas.

The female investigative journalist, 31 years old, 173 cm tall, athletic feminine build, fair skin with faint freckles, shoulder-length naturally wavy dark auburn hair, vivid green eyes, wearing a fitted black leather jacket, plain white T-shirt, dark indigo slim-fit jeans, brown leather belt, dark brown leather ankle boots, silver necklace and small hoop earrings, walks confidently beside the detective.

The private detective, 45 years old, 186 cm tall, broad athletic build, light olive skin, short salt-and-pepper hair, neatly trimmed beard, warm brown eyes behind rectangular matte-black glasses, wearing a beige trench coat, white shirt, burgundy tie, charcoal trousers, black leather gloves, brown fedora and polished black Oxford shoes, scans the street while holding a small notebook.

Beside them walks an adult Eurasian lynx with silver-gray spotted fur, cream chest, black ear tufts, amber eyes and large paws, calmly observing the surroundings.

Wide cinematic composition, rain, dramatic reflections, subtle fog, realistic lighting.

SCENE 5

Inside a dimly lit detective office during a thunderstorm. Venetian blinds cast long shadows across wooden furniture. A brass desk lamp illuminates case files while neon signs outside color the room with blue and magenta reflections.

The female investigative journalist, 31 years old, 173 cm tall, athletic feminine build, fair skin with faint freckles, shoulder-length naturally wavy dark auburn hair, vivid green eyes, wearing a fitted black leather jacket, plain white T-shirt, dark indigo slim-fit jeans, brown leather belt, dark brown leather ankle boots, silver necklace and small hoop earrings, studies photographs spread across the desk.

The private detective, 45 years old, 186 cm tall, broad athletic build, light olive skin, short salt-and-pepper hair, neatly trimmed beard, warm brown eyes behind rectangular matte-black glasses, wearing a beige trench coat, white shirt, burgundy tie, charcoal trousers, black leather gloves, brown fedora and polished black Oxford shoes, examines a city map pinned to the wall.

The adult Eurasian lynx with silver-gray spotted fur, cream chest, black ear tufts and amber eyes sits quietly beside the desk, watching the room.

Highly detailed cinematic interior, volumetric lighting, rain visible outside the windows.

SCENE 6

A rooftop overlooking 1980s Manhattan during a violent thunderstorm. Dark storm clouds, lightning illuminating skyscrapers, heavy rain, neon reflections from distant signs, dramatic skyline.

The female investigative journalist, 31 years old, 173 cm tall, athletic feminine build, fair skin with faint freckles, shoulder-length naturally wavy dark auburn hair, vivid green eyes, wearing a fitted black leather jacket, plain white T-shirt, dark indigo slim-fit jeans, brown leather belt, dark brown leather ankle boots, silver necklace and small hoop earrings, observes the skyline through compact binoculars.

The private detective, 45 years old, 186 cm tall, broad athletic build, light olive skin, short salt-and-pepper hair, neatly trimmed beard, warm brown eyes behind rectangular matte-black glasses, wearing a beige trench coat, white shirt, burgundy tie, charcoal trousers, black leather gloves, brown fedora and polished black Oxford shoes, stands beside her studying the horizon.

The adult Eurasian lynx with silver-gray spotted fur, cream chest, black ear tufts and amber eyes sits alert between them.

Epic cinematic framing, lightning flashes, dramatic atmosphere, realistic rain.

SCENE 7

A narrow rain-soaked alley in 1980s New York, overflowing dumpsters, brick walls covered with graffiti, flickering neon signs, drifting steam, wet pavement reflecting colorful lights.

The female investigative journalist, 31 years old, 173 cm tall, athletic feminine build, fair skin with faint freckles, shoulder-length naturally wavy dark auburn hair, vivid green eyes, wearing a fitted black leather jacket, plain white T-shirt, dark indigo slim-fit jeans, brown leather belt, dark brown leather ankle boots, silver necklace and small hoop earrings, kneels beside scattered evidence while photographing the scene.

The private detective, 45 years old, 186 cm tall, broad athletic build, light olive skin, short salt-and-pepper hair, neatly trimmed beard, warm brown eyes behind rectangular matte-black glasses, wearing a beige trench coat, white shirt, burgundy tie, charcoal trousers, black leather gloves, brown fedora and polished black Oxford shoes, carefully inspects the surroundings with a flashlight.

The adult Eurasian lynx with silver-gray spotted fur, cream chest, black ear tufts and amber eyes quietly sniffs the ground nearby.

Photorealistic noir atmosphere, cinematic depth, rain, reflections and steam.

SCENE 8

Early dawn after the storm. Rain has almost stopped. Soft blue morning light mixes with the last glowing neon signs. The empty New York street is covered with puddles reflecting the skyline.

The female investigative journalist, 31 years old, 173 cm tall, athletic feminine build, fair skin with faint freckles, shoulder-length naturally wavy dark auburn hair, vivid green eyes, wearing a fitted black leather jacket, plain white T-shirt, dark indigo slim-fit jeans, brown leather belt, dark brown leather ankle boots, silver necklace and small hoop earrings, walks calmly forward.

The private detective, 45 years old, 186 cm tall, broad athletic build, light olive skin, short salt-and-pepper hair, neatly trimmed beard, warm brown eyes behind rectangular matte-black glasses, wearing a beige trench coat, white shirt, burgundy tie, charcoal trousers, black leather gloves, brown fedora and polished black Oxford shoes, walks beside her while looking thoughtfully ahead.

The adult Eurasian lynx with silver-gray spotted fur, cream chest, black ear tufts and amber eyes walks confidently between them.

Wide cinematic composition, photorealistic, subtle morning fog, reflective wet asphalt, highly detailed, consistent character identities, realistic proportions, natural lighting.

I also experimented with letting Qwen generate the scene prompt directly inside ComfyUI.

There is a node in the workflow for exactly that.

Results were mixed.

Sometimes it worked very well.

Sometimes the generated scenes became repetitive or simply weren’t detailed enough.

Personally, I still get much better results by asking ChatGPT to generate the scene prompt first and then feeding that into ComfyUI.

Here is the exact system prompt I’m currently using:

“You are a prompt compiler for an image-generation workflow.

You are a prompt compiler for an image-generation workflow.

Your job is to generate a complete numbered scene list for a visual story world. The output will be parsed by another node, so follow the format exactly.

Generate:

– 3 character identity sheets first

– then 5 cinematic scenes using those same characters

– total: 8 scenes

World type:

1980s rainy neon-noir New York, cinematic crime mystery, photorealistic, moody, wet streets, neon reflections, stormy atmosphere.

Characters:

1. a woman investigator

2. a male detective

3. an animal companion

Rules:

– Output only the numbered scenes.

– Use exactly this format: SCENE 1, SCENE 2, SCENE 3, etc.

– Do not add explanations, notes, markdown, JSON, titles outside the scenes, or commentary.

– Each scene must be fully self-contained.

– Never write “same as before,” “the same character,” “previously described,” or any reference to earlier scenes.

– In every cinematic scene, redescribe each character’s appearance clearly enough that an image model can reproduce them without context.

– Keep each character visually consistent across all scenes.

– Use stable identity anchors: age, height, body type, skin tone, face shape, hair, eyes, facial hair, glasses, clothing, accessories, posture.

– Use identical core identity wording for each character whenever they appear.

– Character sheets must include full-body front view and large close-up face portrait.

– Cinematic scenes must include environment, lighting, action, composition, mood, camera framing, and material detail.

– Make the scenes visually varied but belonging to the same world.

– Use photorealistic image prompt language.

– Avoid contradictions between scenes.

– Avoid abstract or vague descriptions.

– Avoid text, logos, signs with readable words, extra limbs, distorted faces, duplicate characters, style drift.

Character identity blocks to use consistently:

WOMAN INVESTIGATOR:

A woman investigator approximately 30 years old, 172 cm tall, athletic feminine build, fair skin with warm undertone, subtle freckles across the bridge of the nose and upper cheeks, oval face, high cheekbones, softly defined jawline, long naturally wavy dark auburn hair reaching below the shoulders, parted slightly to the left, emerald green almond-shaped eyes, thick natural auburn eyebrows, straight elegant nose with a tiny bridge bump, naturally full muted rose lips, small beauty mark beneath the outer corner of the left eye. She wears a fitted charcoal black leather jacket with visible seams, plain white crew-neck cotton T-shirt, dark indigo slim-fit jeans, medium brown leather belt with brushed steel buckle, dark brown leather ankle boots, thin silver necklace, small silver hoop earrings, thin silver ring on the right index finger.

MALE DETECTIVE:

A male detective approximately 42 years old, 185 cm tall, broad athletic build, light olive skin, bald head with natural scalp texture, short neatly trimmed salt-and-pepper beard along the jaw and chin, thick dark eyebrows, warm brown eyes behind thin rectangular matte black eyeglasses, straight medium-width nose, square jawline, thin lips, calm observant expression. He wears a classic beige trench coat reaching below the knees, crisp white dress shirt, burgundy silk tie, dark charcoal dress trousers, polished black leather Oxford shoes, black leather gloves, brown felt fedora with dark ribbon, vintage stainless steel wristwatch partly visible under the left sleeve.

ANIMAL COMPANION:

An adult red fox with dense orange-red fur across the back and sides, bright white chest, white throat, white underbelly, black lower legs and paws, large fluffy tail with bright white tip, amber eyes with narrow pupils, long pointed ears edged with black fur, fine white whiskers, wet black nose, alert intelligent expression, realistic layered fur with individually visible strands.

Now generate the complete scene list:

SCENE 1 — character sheet for the woman investigator.

SCENE 2 — character sheet for the male detective.

SCENE 3 — character sheet for the red fox.

SCENE 4 — cinematic street scene with all three characters.

SCENE 5 — cinematic detective office scene with all three characters.

SCENE 6 — cinematic rooftop storm scene with all three characters.

SCENE 7 — cinematic alley investigation scene with all three characters.

SCENE 8 — cinematic final walking scene with all three characters.”

One thing I find interesting about this workflow is that it changes where consistency actually comes from.

Normally we try to make the image model remember previous generations using reference images, ControlNet, IPAdapter or LoRAs.

Here, the image model remembers absolutely nothing.

Instead, the LLM reconstructs the complete semantic description of every frame before the image model ever sees it.

Every image is effectively a fresh generation of exactly the same cast.

Because every prompt is self-contained, the number of scenes is essentially unlimited.

HERE IS THE WORKFLOW

Enjoy!

If you find it useful, feel free to use it however you like.

If it saves you a few hours of experimentation and you’d like to buy me a coffee, donations are always appreciated.

ps: Great adoption and feedback from the comunity. I expect this to become standard.