Skip to main content

Maintaining Scene Continuity with Sora-2

 We're working on a promo video for our smartphone-based CW practice app. We had great luck last week using the sora-2 app to create B-Roll footage for the Gladych Files. This week, I'm hoping to make an entire scripted trailer for the CW app using sora-2.

There are issues though. The first one is that while the sora app has a storyboard feature, (at least the one I can access this week), the API does  not. It does however allow you to pass in reference images to bridge scenes. That's pretty cool, and seems to work.

I'm working on  a Python script to wait for bridging images between clips. That's worked out ok. 

Table comparing TL;DR features of Sora web versus Sora API, showing storyboards and scene linkage built into the web app but requiring prompt engineering and referenced video IDs in the API.

The real issue, so far, has been sora-2 moderation.

Profanity and Real-Person Filters

You cannot pass the image of a real person, (even one sora-2 invented), between clips. Moderation stops it every time. (Moderation is what sora-2 calls its engine that decides if it's able to make your video at all.) This is what set off a cascade of moderation issues I've yet to overcome.

Close-up of a man in blue over-ear headphones and an orange jacket, looking into the camera in a studio setting, as if recording or reacting to something on screen.

geesh. So far,  I've tried to use a cartoon motif instead, moved to scenes where faces weren't visible. Neither thing helped. Apparently sora-2 is rather prudish with respect to profanity and violence so far. Consequently this scene prompt mostly isn't getting created.


Same operator, now imagined as a sci-fi pilot. Semi-stylized digital painting realism,
\nstrong lighting, soft brushwork,
still grounded and realistic.\n\nIMPORTANT FACE RULES:\n
- Pilot always wears a blast helmet.\n- Helmet visor and design must completely obscure the eyes and upper face.\n
- Only the pilots chin and sometimes the mouth are visible.\n
- No reflections revealing facial features.\n\nNo readable on-screen text.\n\n
[SCENE 2IMAGINED X-WING / TIE-STYLE COCKPIT, SIDETONE DELAY]\n\n
Vertical 6:19 / 9:16.\n\nWe transition from the shimmer at the end of Scene 1 directly into the cockpit of a\n
starfighter, inspired by X-Wing / TIE Fighter designs but generic: angular windows,\n
metallic ribs, analog switches, retro-futuristic indicator lights, and a flight stick.
\n
Deep space and distant stars are visible outside; no visible enemies, no active combat.
\n\n
The pilot is the same person as the operator, now imagined in this cockpit.\n
They wear a bulky blast helmet with a tinted visor and side panels that fully hide the
\n
upper face. Only their chin and mouth area are sometimes visible when they speak.\n\n
A reference still from Scene 1 is provided. Use it to match:\n- body build,\n
- basic posture,\n
- overall lighting mood,\nso the operator and pilot feel like the same person.\n\n
The pilot keys Morse on a small console paddle. The delayed sidetone problem is the\n
same as in Scene 1: the audio beeps are slightly late relative to the hand motion.\n
The pilot clearly knows whats wrong and reacts with authentic frustration.\n\n
The pilot yells, visible only from helmet and chin:\n
\"Darn it, I can’t transmit with this sidetone delay!\"\n\n
They pull off a removable blast visor attachment on the front of the helmet, or tilt\n
the chin slightly upward in exasperation, but the upper face remains entirely hidden\n
behind the helmet structure.\n\nCockpit lighting is dramatic but controlled,
emphasizing the polished metal surfaces\n
and the pilots gloved hands on the controls.",



Sora-2 Cross talk

Here's the part that's kind of fun. I asked GPT-5 if sora-2 had access to my chat context. The chatbot assured me that the videobot did not... Yeah, it might though!

Here's an image the kid and I asked GPT-5 to mock up for us a few days ago while planning out howw to add math lessons to the Gladych Files. Anyway. But, notice that there are two kids and a dog. sora-2 never seen this image.
Retro cartoon poster with bold text reading Boys! Girls! Everyone!, showing a smiling red-haired boy, a blonde girl in a blue dress and mortarboard saying Puppies! in a speech bubble, and a happy brown puppy between them.

I clipped this portion of the image and passed it to sora-2 as a reference to start from.
Cropped retro cartoon of a cheerful red-haired boy in yellow overalls and a green-striped shirt raising one hand in greeting against a warm textured background

Here's what sora-2 outptut. Notice that the blonde kid even has the graduation cap! Kinda cool I think, and it might point to some interesting usage modes as time progresses.

Vintage-style cartoon scene of two kids and a puppy around a Morse code key on a wooden table, with an old-fashioned radio and meter, echoing the earlier math-class poster style.

The prompt for the blonde kid was 

  * Blonde kid: long ponytail tied with a big ribbon,\n    
    blue dress and tiny academic cap, white socks and shoes.\n

So I guess, it could have guessed the look. Or, perhaps  both gpt-5 image generation and sora-2 use the same engine deep down which would also be interesting to know.

In the meantime, I'm starting to see improved results with the sora-2-pro model vs. sora-2 with respect to image moderation. sora-2-pro costs 3X as much (thirty cents per second as opposed to 10 for sora-2.) I'll keep you posted.




Comments