As of March 2026, generating character-consistent videos has become surprisingly accessible.
A technique that’s been gaining attention online recently caught my interest, so I decided to try it out myself.
By preparing a single still image, generating a character reference sheet with Nanobanana via Google Flow, and then creating a video using Veo, it’s already possible to maintain consistent characters and objects across outputs.
I’ve been exploring generative AI as a hobby for about two years now, and compared to back then, the technological progress feels nothing short of dramatic.
For even stronger consistency and character reuse as persistent IDs, KLING 3.0’s Omni feature seems promising. That’s something I’m planning to explore next.

AI image generation

Cyberpunk boy

We set up the boy as the main character.

realistic candid photo of a cyberpunk boy, wearing a black beanie, deep clear blue eyes, wearing slightly worn round orange goggles, faded olive green military jacket with visible creases and fabric texture, layered over a black hoodie, khaki cargo pants, black military boots, natural skin texture with pores, slight blemishes, subtle dark circles under eyes, relaxed neutral expression, imperfect posture, soft natural lighting, gray background, minimal simple backdrop, shot on 50mm lens, shallow depth of field, documentary photography style
a character image generated with Google ImageFX

Boys bike “professional vehicle”

We set up the motorcycle for the boy to ride.

realistic off-road motorcycle with a cyberpunk nomad style, rugged and heavily modified, matte black and worn metal body with subtle colorful accents, dust and scratches, utility-focused design, equipped with storage packs, side bags, and strapped gear, exposed mechanical parts, reinforced frame, thick off-road tires, minimal futuristic elements blended with practical survival design, soft natural lighting, neutral gray background, simple studio backdrop, realistic photography, 50mm lens, shallow depth of field
a motorcycle image generated with Google ImageFX

This time, I used ImageFX, but it’s perfectly fine to create it using Nanobanana, A1111, Flux, or other tools.

Character reference sheet

Key Points for Creating a Character Reference Sheet (This Workflow)
1. Required Panels
・Full-body views: front, left side, right side, back
・Close-up portraits: front, left profile, right profile
2. Pose and Scale
・Standardize using an A-pose
・Keep head height and face size consistent across all panels
3. Appearance and Outfit Consistency
・Maintain consistent colors and textures for eyes, hair, clothing, and accessories
・Decide in advance whether to remove piercings or decorative elements during generation
4. Background and Lighting
・Use a simple background (gray or white)
・Standardize the direction, intensity, and softness of the light source
・Ensure shadows are consistent across all panels
5. Realism Adjustments
・Preserve natural skin texture (subtle asymmetry, wrinkles, etc.)
・Avoid over-perfection to prevent a doll-like appearance

Use the above as a guideline and adjust while reviewing the generated results.

Cyberpunk boy sheet

Create a professional character reference sheet of a cyberpunk boy based on the provided reference image. Match the exact visual style, realism level, rendering approach, texture, and color treatment of the reference. The character has deep clear blue eyes and wears a black beanie, slightly worn round orange goggles, a faded olive green military jacket with visible creases and subtle wear, layered over a black hoodie, khaki cargo pants, and black military boots.

Ensure natural human realism: include subtle skin texture, pores, slight blemishes, and natural asymmetry. Avoid overly smooth or perfect surfaces. Preserve a lived-in, slightly worn look across clothing and materials.

Use a clean, neutral plain gray background. Present the sheet as a technical model turnaround.

Layout:
- Two horizontal rows
- Top row: four full-body standing views in this order: front view, left profile (facing left), right profile (facing right), back view
- Bottom row: three close-up portraits in this order: front portrait, left profile portrait (facing left), right profile portrait (facing right)

Maintain perfect identity consistency across all views. Keep the character in a relaxed A-pose, with accurate anatomy, consistent proportions, and clear silhouette. Ensure uniform scale and alignment, consistent head height across full-body views, and consistent facial scale across portraits.

Lighting must be consistent across all panels: soft, natural lighting with controlled shadows, no dramatic contrast shifts. Keep a documentary-style realism rather than cinematic stylization.

Ensure even spacing, clean panel separation, sharp focus, and a crisp, print-ready reference sheet appearance.
no stylized 3d rendering, no cgi look, avoid mannequin-like appearance.

Professional vehicle sheet

Create a professional vehicle reference sheet of a cyberpunk nomad-style off-road motorcycle based on the provided reference image. Match the exact visual style, realism level, rendering approach, texture, and color treatment of the reference.

The motorcycle is rugged and heavily modified, with a matte black and worn metal body, faded olive green accents, and subtle orange details. It has a utility-focused design with visible wear such as dust, scratches, and slightly faded surfaces. Include storage packs, side bags, strapped gear, exposed mechanical components, reinforced frame, and thick off-road tires. The design should feel practical, survival-oriented, and grounded rather than overly futuristic.

Use a clean, neutral gray background. Present the sheet as a technical vehicle turnaround.

Layout:
- Two horizontal rows
- Top row: four full vehicle views in this order: front view, left side view, right side view, rear view
- Bottom row: three detailed close-ups in this order: front section (headlight/handle area), mid section (engine/frame/mechanics), rear section (storage/exhaust/wheel)

Maintain perfect structural consistency across all views. Ensure accurate proportions, mechanical coherence, and clear silhouette. Keep consistent scale, alignment, and spacing across panels.

Lighting must be consistent across all panels: soft, natural lighting with controlled shadows, no dramatic contrast shifts. Preserve realistic material response (metal, rubber, fabric) without stylization.

Ensure sharp focus, clean panel separation, and a crisp, print-ready technical reference sheet appearance. Avoid cinematic staging and avoid stylized or exaggerated sci-fi elements.

no 3d render look, no cgi, avoid overly clean showroom appearance, no futuristic hoverbike design.

Video generation using combined assets

Veo Prompt(English / Cinematic)

A cinematic desert sequence featuring a cyberpunk boy riding a rugged nomad-style off-road motorcycle across a vast, sun-scorched desert. The boy wears a black beanie, round orange goggles, a faded olive green military jacket over a black hoodie, khaki cargo pants, and black military boots. His posture is slightly leaned forward, focused, steady but not perfect.

The environment is a harsh, endless desert with fine sand, scattered rocks, and heat distortion. In the far distance, a massive futuristic megatown appears through a shimmering mirage, partially obscured by heat haze and atmospheric distortion.

Camera:
Start with a low tracking shot beside the rear wheel, capturing sand kicking up in slow motion. Transition into a wide side tracking shot showing the rider moving across the landscape. Then cut to a frontal long shot with the distant megatown visible ahead, distorted by mirage. End with a slightly elevated trailing shot as the rider moves toward the horizon.

Motion:
Natural, slightly rough off-road movement. Suspension reacts to terrain, subtle body shifts, realistic acceleration. Sand particles and dust trails respond dynamically to the bike’s motion.

Lighting:
Strong natural sunlight, slightly warm tone, high noon desert lighting. Subtle lens glare, soft atmospheric haze. No dramatic color grading.

Style:
Documentary-style realism, grounded and tactile. Avoid overly cinematic stylization or artificial CGI look. Maintain natural imperfections and physical realism.

Additional details:
Wind interacting with clothing, minor fabric movement, dust accumulation on surfaces, heat shimmer distortion in the distance, realistic scale of environment.

Negative:
cgi, 3d render, overly smooth motion, unrealistic physics, sci-fi hoverbike, neon cyberpunk city glow dominating the scene, cartoon style
A cinematic scene set at an abandoned roadside gas station in a desert environment. A cyberpunk boy arrives on a rugged nomad-style off-road motorcycle and stops near an old fuel pump. The location feels worn and quiet, with faded signage, dust, and subtle wind.

The boy wears a black beanie, round orange goggles (slightly lifted or resting on his forehead), a faded olive green military jacket over a black hoodie, khaki cargo pants, and black military boots. His clothing shows light dust and wear.

Action:
He turns off the engine, gets off the bike, and walks a few steps toward the camera. He looks directly into the lens and speaks naturally, as if addressing the viewer. His tone is casual, slightly curious, and friendly.

Dialogue (spoken naturally, synced with lip movement):
"Hey! It's my first time around here... do you know if there's a diner nearby?"

Performance:
Natural facial movement, subtle blinking, small head tilts, relaxed posture. Slight dryness in lips and skin due to desert environment. No exaggerated acting.

Camera:
Medium shot transitioning into a closer framing as he approaches. Eye-level perspective, slight handheld movement for realism.

Environment:
Desert wind, faint dust movement, distant heat haze. Old gas station structure with worn textures, muted colors.

Lighting:
Natural sunlight, slightly warm tone, soft shadows. No dramatic cinematic lighting.

Style:
Documentary-style realism. Grounded, natural, human presence. Avoid artificial or overly cinematic visuals.

Negative:
cgi, 3d render, cartoon, anime, exaggerated acting, lip sync mismatch, robotic speech, overly clean environment

Even using just Google Flow, I found it surprisingly easy to create videos with characters that feel consistently fixed and convey a strong sense of atmosphere.
During video generation, some small details—like the character’s accessories or parts of the motorcycle—sometimes changed slightly. I realized that the key to minimizing adjustments is to refine the prompts for video generation, reduce elements in the source images that might confuse the AI, and avoid introducing irregularities that the AI hasn’t learned during training.