Lessons learned from creating AI videos

- June 21, 2025

About 3 weeks ago I subscribed to a free month of Google's Gemini Pro. With it, I could make 3 high quality 8 second videos a day with sound. Three is not many but it served my needs. And while eight seconds seems low, the next time you watch TV or a film, notice how often and how fast the shot changes. According to a Wired Magazine article "The average shot length of English language films has declined from about 12 seconds in 1930 to about 2.5 seconds today"

Note: This blog is aimed at two audiences. One, those not as enamored of tech as I am and the other, my fellow enthusiasts. For the first folks, my Substack newsletter reproduces the relevant posts. The more techy posts live only here on the blog, as they have for the last 17 years. This is one of those posts.

My project I worked on was a series of about seven unique shots. In each case they were an employee modeling the proper phrasing when speaking to a customer. I was putting a new element into the old boring training PowerPoint.

I made my 3 shots a day, but I ended up probably making over 20 videos. I struggled with a couple of things at first :

Making videos use the same character from one shot to a next, or at least similar looking characters. I'd end up with two different people or even as I started to figure out how to make them consistent, one video would have the character wearing a different color of shirt than they had in the previous video.

The other issue was avoiding Google's Veo 3 video creation tool from putting words on the screen I did not ask to have there. It is an irritating bug. It would be bad enough it simply put the dialog I had specified in the prompt on screen,, but it would instead make the words look like someone who couldn't spell had typed them.

I spent a long time reading online, and experimenting to find solutions. It all comes down to the prompt.

===== Update =====

Realized I forgot to mention how I got the same characters. The trick is to write a prompt describing in great detail how you want the character to look, run it thru ChatGPT and ask it to augment it for you. Then use that resultant prompt every time and add to it what you want them to say or do unique to that clip.

A help desk agent in his mid-20s, male, with short brown hair and blue eyes, is wearing a crisp, logo-free blue shirt. He sits alone at a desk in a modern call center, framed in a medium close-up (chest up). The background features gray fabric call center partitions and soft, neutral white overhead lighting. The camera remains steady and always focused solely on the agent. No other people are ever visible on screen.

He maintains a calm, professional expression throughout. The style should remain realistic and consistent in every video—same setting, lighting, clothing, and appearance. No text, captions, or subtitles should appear on screen at any time.

Only after that do I put in the dialog specific to that clip. But these two paragraphs went into each clip I wanted that person in.

===== End of Update =====

For a clip about a classic aircraft landing, the prompt might look like this :

A Boeing 314 Flying Boat lands gently in a south sea island lagoon with the island scene behind it. The only sound is the accurate sound of the engines. There should be no titles, subtitles, captions or words on the screen.

A suggestion I saw online was to run my prompt through ChatCPT, another AI. I did, and it changed that prompt to this :

Generate a realistic video of a Boeing 314 Flying Boat landing gently in a calm South Sea island lagoon. The aircraft should approach from the sky and descend smoothly onto the water. In the background, show a lush tropical island with palm trees, soft hills, and a few native huts or structures that evoke the 1940s era. The lighting should suggest late afternoon or golden hour, with warm highlights on the plane and water. The camera should follow the descent in a slow, cinematic style, possibly from a vantage point near the shore, capturing the moment the plane touches down and skims to a stop.

This is the resulting video :

You likely saw the text in the prompt : there should be no text, titles, captions, or subtitles visible on screen at any time. Since I started putting that into my prompts, I have not had an errant title on screen. It is also best if you put in dialog in the prompt to put it after colons, and never in quotes.

I'm having way too much fun creating movies and still images using these tools. I've always had visual creative urges, but lacked the dexterity and talent to draw and paint, and neither the time, energy or expense to pursue photography or videography seriously. There was a period where I shot two camera videos of friend's weddings and the plays put on by a senior center. I enjoyed the editing that came after the shoots more than the shoots themselves. Now, by describing what I see in my mind's eye, I can create interesting visuals.

I'm still learning. And since this blog is essentially a journal of my tech pursuits, there will be more here soon. There is an advanced tool Google has called Flow I want to spend some time with. And I've used ChatGPT's Sora product to create some great images for free.

If you want to see what others are producing using Veo 3, check YouTube. Be careful, many of the videos are tutorials but there are some interesting videos there as well. Here is one more like a clip from a TV program (warning rough language).

Meanwhile, you can see my images and videos as I create them.

And later the same night, I made a better version.

Search This Blog

Ideas from Mark Stout

Lessons learned from creating AI videos

Comments

Post a Comment

Popular posts from this blog

Notes Folder : My new note taking system

Recording your own notes with Google Voice

Your First Day with Evernote