One of our favorite long-time clients, Confluent, called us to help them brainstorm creative for the video that opens the annual event 'Current', the conference they host for the Apache Kafka community. The end result is a 75sec 'mostly' AI generated video:

Let's unpack it:

It was August and 'Current 2023' was only few weeks away from launch in late September.

As is usually the case with keynote opener videos, team Confluent wanted the video to be BIG with lots of sparkle and eye-candy to get the audience juiced.

Team Confluent shared a rough outline of a script along with a few collected visuals and a loose idea of how the video would begin: an origami bird - a crane - that would shape-shift into different birds and objects.  This construct would support the key message and tenet of the event — reimagine.

[Origami crane - attribution: midjourney]

Upon seeing these images that the Confluent Art Director had generated using Midjourney, we immediately new the approach we'd like to take — an opening video created entirely with AI generated media.

It was either that, or a stock-video montage with a little motion graphics sprinkled on top - because that was about the only other option with what time we had remaining to get something done on time for the event.

We gave ourselves a couple of days to research available tools for generating AI media. Does the reality match the hype? We put all our best minds and talent on the task. Our conclusion was a measured yes, but with caveats: the tools for image creation are much further along than the tools for video, and both are constrained to lower resolutions than we typically like to work with - in some cases, a lot lower. This would mean we'd have to upscale final outputs. Also, and potentially more problematic, was the very short duration of generated video output from Runway and Pika (the two best AI video generators at the time of this project). This would mean having to stitch a lot of short clips together (each clip only a several seconds in duration).

Initial research informed us that AI generative visuals, while often impressive, are never exactly what you ask for. It might be sometimes close, but most often it's not. Ask AI to create an output with the same prompt twice and you'll get two very different outputs. This was going to be a problem for continuity. To complete this project would mean having to embrace the creative AI-mind and find other ways to establish a visual link across all clips that appear seamless with the narrative.  

We talked this over with Team Confluent, sharing our findings, assessing the risks, the possible scenarios and how to mitigate anticipated problems. Together we agreed that, despite the risks, an event opening video comprising mostly of AI generated media would be worth pursuing.

We then set about creating a story board, defining our method, and our workflow. With the story board approved, we divided into teams to tackle each of the separate 'shot groups' that make up the video (of which there were seven). When complete, we expected the total video duration to be 75secs.

And so the fun began...

The opening sequence comprises the animation of an origami bird created from a flat piece of paper. This is an important shot that supports the narrative VO in establishing the overall theme and message construct. We burned through many, many, hours attempting to have AI generate this sequence. But, we just could not get what we needed. Not even close.  This is a good example of how AI most often "imagines" and "creates" something that is very different visually to that of what is in your head. And while that is often very useful and an excellent resource to expand your own creativity, it is practically useless when you have a very specific visual requirement. Not only that, but the animation of an assembling origami bird is actually quite complex. The AI video generators that exist today are a long long way from being able to generate such prescribed and complex animations.

[A still from an early attempt at an AI generated origami bird animation. While impressive, these were a long way from what was required

The next tricky problem was the 'eye blink' that occurs at 23 secs. The eye image was an output from Midjourney. Feeding that image into runway and prompting it to blink was about as successful as asking your pet dog to prepare a grilled ribeye for dinner. Grand efforts were made, but the result was nothing resembling the ask. Prompting with text only (and no seed image) didn't yield any better result.

This meant we had to create the origami animation, and the eye blink, using traditional CGI methods. Fun fact: In both these cases, it took us only a fraction of the time to create in CGI than it did attempting to do so in AI. Uh oh, this was not a good precedent.

That said, those two shots (the origami shot and the 'eye blink') were really the only two specifically prescribed shots. All remaining shots and sequences, while story-boarded, had some creative wiggle-room on how they might be visually portrayed. In other words, our AI creative partners would have some creative license on what, and how, things might be visually expressed — so long as we didn't go too far off the rails.

Our general workflow, for every shot, was to first create a set of images using Midjourney that would become seed images for generated videos (using either Runway or Pika or both), or base images that we would use to create animations from. The latter using traditional methods such as camera mapping (giving the illusion of depth and parallax motion across multiple planes) or frame interpolation morphing to create short sequences of animations using a set of similar images.

Our method for solving the issue of continuity was to create a CGI overlay of an animated bird (constructed from abstract representations of data-particles) that would fly across discrete scenes shedding data particles as it went (a metaphor for real-time data in motion).  This would allow us to weave otherwise disparate scenes together.

[image of an early version of our data-particle bird in flight through a camera mapped Midjourney scene].

Even though a 75 second video is short, we were conscious of over-using the same technique for motion on still frames (generated outputs from Midjourney) and diluting the overall effect with perceived repetition. This, too, is where our CGI overlaid data-particle bird is helpful, but we were in search of another 'trick' to help maintain visual interest. Playing about with the Midjourney 'out-painting' feature (where AI generates an in-context fill around an already created image) presented us with the possibility of creating infinite zooms. We created an After Effects script to help with motion blur and easing on these massively long 'virtual rail' shots. A good example of where we use this is the moment we zoom, from orbit, to a terrestrial scene, at about 42 secs in.

[Outpainting to create an overszied canvas – allows long rail zoom effects]

Part way through this project, Midjourney updated their bag-o-tricks with an 'in-painting' feature (where you can marquee-select a portion of a generated image and AI will re-generate an in-context fill) allowing us to retain a portion of an image through the generation of an entire set.  You'll see us use this technique during the quantum computing sequence (at 49-54 secs - immediately before using another infinite zoom). We used a similar method to create the 'development community' sequence (57-65secs) and to retain the scene division so that they all lined up correctly.

[Inpainting to create sections of an image that does not change from generation to generation]

Conclusions and learnings.

We were very happy with the end result - and so too was team Confluent.  However, it wasn't as easy to create such a piece as hype would lead you to believe. We set out to create a video that would comprise as close to 100% AI Generated media as possible. Where did we land? Our finished video comprised not more that 70-75% of AI generated content.  The remainder, including the origami crane, the particle bird, and the intro/outro's were all CGI sequences created via traditional means. Moreover, the time it took for us to create the the AI visuals (and be satisfied with the outputs) was far longer than expected. The time might have been better spent creating the entire video with the traditional tools we know and love.  

We are in the early days of AI image and video generation. The tools are primitive.  To obtain great results takes considerable time and talent. Our jobs remain secure. However, it is clear that AI generated visual content is on a path for extraordinary results — where each of us mere mortals could potentially create stunning work with only a series of simple prompts.  The time horizon for getting there?  Hard to say. But years, not months.

Primary Tools used:

Topaz AI