The AI Kitchen Sink | An Exploration of Image Generators for a 'Dead End Thrills' Music Video

April 7, 2023 Joey Foster Ellis

As an artist and basically a self-proclaimed AI inquisitor, I’ve always been fascinated by the intersection of creativity and technology. Over the last few days, I went down a personal rabbit hole of uncharted AI image generation territories, inspired by Cubicolor's classic emotive track "Dead End Thrills" and its Patrice Bäumel Remix, I put everything I could find, literally the whole kitchen sink, into one experimental piece.

Not intended to be artistically beautiful but somewhat exploratory, I aimed to see what I could do with the “data” I created and try to make something reasonably cohesive. Although I don't think I succeeded in that regard, the result is still intriguing.

The music's atmospheric and introspective nature led me to reflect on themes of inquiry, creativity, the blending of human and machine elements, and the delicate balance between control and unpredictability in AI. And the beats worked quite well with the frame rate that resulted from the experiments. It's strange.

See this content in the original post

The song's title, "Dead End Thrills," carries its own significance concerning AI development, representing the balance between risk and reward in the field. While pushing the boundaries of AI can lead to groundbreaking advancements and benefits, it can also result in unforeseen consequences or ethical concerns. The "thrills" in the title might indicate the excitement of achieving progress, but the "dead end" part reminds us of the potential pitfalls and the need for caution and responsibility in AI development. Although there is much to talk about how these AI platforms were trained and what artwork they were trained with, I won't go into my stance here but acknowledge that it is an issue.

With these concepts in mind, I created a rather "wonky" short music video using a comprehensive set of AI image generation tools, including text-to-image, image-to-image, outpainting, and video-to-video transformations with additional text prompts. I utilised various platforms for this project: Midjourney for text-to-image generation, DALL-E 2 for image-to-image and outpainting, and Runway for video-to-video transformations.

The creative process began with a series of text prompts inspired by the song's title and lyrics, which were then fed into Midjourney's text-to-image AI tool. The generated images encapsulated the sense of reaching the limits of a specific path and finding new ways to overcome obstacles or limitations, resonating with the concept of "Dead End Thrills."

Prompt for below images: “People watching music television channels like MTV and VH1 in the 1990s outside a train window, music videos with on-screen lyrics, engaged viewers, emphasising the window, the train window is important, the people's faces are important, ad, ultra-realistic, photorealistic, ultra-detailed, depth of field, white lightning, 8k, 75mm wide angle 75mm Nikon lens, Stephen Shore photo style”

These initial images were then transformed using DALL-E 2’s outpainting processes, which allowed me to experiment with different styles and visual elements to extend the image's original borders. The resulting photos somewhat captured the essence of Cubicolor's emotive soundscapes, blending the familiar with the unknown and revealing surprising patterns and textures.

Prompt for below images: “🌸, People watching music television channels like MTV and VH1 in the 1980s, music videos with on-screen lyrics, engaged viewers, emphasising the screen with the written lyrics, the tv is important, the screen must have lyrics on it, the people's faces are important, ad, ultra-realistic, photorealistic, ultra-detailed, depth of field, white lightning, 8k.”

Although the above “outpainting” might just be a small amount, it is in no amount a small endeavour. Also, it is in no way perfect.

The second part of the process evolved from an in-class prompt during a Parson’s class I’m currently taking. I was given a prompt about morphing in animation and had to develop an object that morphed into another thing. I chose a television morphing into an airplane, and since my drawing skills aren't the best, I decided to recreate the scene in stop-motion first and then use Rough Animator to animate it. I then overlaid the drawing part over the original, and then the exploration using AI truly began. I used the resulting sequence in both DALL-E 2 and Runway.

See this content in the original post

Having “success”, I then put into DALL-E 2 and Runway some of the hand-drawn animations I had already created in class to see the results. I used the video-to-video with text prompt tool in Runway and the image-to-image generator in DALL-E 2 using the text prompt “claymation”.

So let's ask ChatGPT to explain the science behind AI image and video generation:

AI models like DALL-E 2 have been trained on vast datasets of images, learning to analyse and generate new images based on input data. These models encode the original images into lower-dimensional representations that capture their essential features. They then decode these representations into unique images with added or modified elements based on the model's understanding of various visual styles, patterns, and objects. Video-to-video generation follows a similar concept but works with sequences of images or frames instead of single images. The AI model analyses each frame of the input video and generates a new frame based on its understanding of visual styles, patterns, and objects. The model ensures smooth transitions between frames, producing a coherent output video. Adding a text prompt (which I did in that I used the text prompt “claymation”) introduces an additional layer of meaning and guidance for the AI model. The model not only analyses the video's visual content but also considers the context and meaning conveyed by the text prompt. This helps create output videos more aligned with the creator's intent and vision.

ChatGPT

If you didn’t get that, I asked ChatGPT to explain it to a 12-year-old:

AI image and video generation is a fascinating process where computers learn to create new pictures and videos. AI models like DALL-E 2 have been taught by looking at a massive collection of images. They learn to understand and create new images based on their observations. The AI takes the main features of the original image and uses its knowledge of different visual styles, patterns, and objects to create a new image with added or changed elements. Video-to-video generation is similar but works with a series of images, or frames, that make up a video. The AI model looks at each frame and creates a new one using its understanding of visuals. It ensures the new frames flow smoothly together, creating a complete and coherent video. When you add a text prompt, it gives the AI model extra information to work with. The model considers not just the visuals but also the meaning behind the text. This helps create videos that better match the creator's ideas and vision.

ChatGPT

Finally, I used After Effects to weave everything into a cohesive narrative to reflect the ebb and flow of the music. Throughout this process, I found myself navigating the balance between control and unpredictability, much like the AI-generated images themselves.

As the music video took shape, I was struck by how AI image generation tools allowed me to explore new creative possibilities, combining the expressive power of "Dead End Thrills" with the computational power of artificial intelligence. The final product is a visual journey that encapsulates the spirit of exploration, creativity, and the fusion of human and machine elements while reminding us to balance ambition and responsibility in AI development. That might go too far, but it shows the potential of AI. (Damnit, ChatGPT, why’d you write that last sentence? It’s overly emotional writing) But overall, what to do with the “data” is up to the artist.

In the music video, the image of the MQ-1 Predator pops up at the very end, and it symbolizes the fine line between the exciting potential of AI advancements and the need for responsible and ethical development. For it was the first drone with AI capabilities. As usual, within my videos, I take images from the Smithsonian Open Access collection. The MQ-1 Predator, an unmanned aerial vehicle (UAV) or drone, was first introduced in the mid-1990s and has since played a significant role in military operations, serving both reconnaissance and targeted strike purposes. Its historical significance underscores the complex relationship between technological advancements and the ethical considerations that come with them.

As we push the boundaries of what AI can achieve in image generation and other fields, it's crucial to remain aware of the possible consequences and to strive for a balance between innovation and responsibility. By concluding the video with the MQ-1 Predator, I aimed to provoke thought and conversation about the broader implications of AI development, encouraging viewers to reflect on the ethical dimensions of these powerful technologies and their impact on our society while also acknowledging the historical context of the technology represented.

This project has not only deepened my interest in the capabilities of AI in the realm of art but also highlighted the importance of embracing the unknown and pushing the boundaries of what is possible, especially for me in just basic frame rates in video, i.e. 12 frames per second versus 24 frames per second versus whatever choppiness that the AI generator gave me.

This "Dead End Thrills" short music video is not a testament to the limitless potential at the intersection of art, technology, and human emotion, but it's a start. Through this experiment, I have discovered that while there are many challenges and obstacles to overcome in the world of AI-assisted art, there is also an infinite source of inspiration and creativity waiting to be tapped into.