This presentation argues that while text is currently ubiquitous, humanity is on the cusp of a new era of AI-enabled communication that will ultimately replace text with more intuitive forms of media like audio, video, and immersive technologies. The speaker, driven by a lifelong passion for media and technology, believes that future generations will view reading and writing as historical artifacts, much like papyrus scrolls or hieroglyphs.
The speaker’s own experiences shaped this perspective. He recounts his love for reading and the transformative impact of the internet, witnessing firsthand how technology changed not just content distribution, but the content itself. He observed how forums differed from books, and how software instruments in music led to entirely new genres and democratized music production. His early online experiences, including running a business within World of Warcraft at age 13, solidified his interest in the evolving landscape of media and communication.
A pivotal moment came with the discovery of a research paper demonstrating the capability of neural networks to generate photorealistic video. This sparked the idea that creating Hollywood-quality films from one's bedroom, with only imagination as the limit, would become a reality within a decade. This idea led to the co-founding of an AI video company, with the initial vision of empowering everyone to become a Hollywood director. While this vision remains exciting, the speaker argues that the broader potential lies in the ability of AI to bring every form of content, from text messages to corporate training materials, to life through video and audio.
The speaker delves into the history of text, recognizing its importance as a method for compressing human communication and conveying meaning across time and space. He notes that the invention of the alphabet and the printing press were significant milestones, eventually leading to widespread literacy. However, he argues that text, while efficient and scalable, is inherently "lossy" as a means of information transfer. It lacks the nuances of tone, body language, and context that are present in face-to-face communication, leading to potential misinterpretations. Even the addition of emojis fails to fully compensate for this lack of nuanced expression.
Visual communication, on the other hand, is presented as a more intuitive and immediate way of consuming information. Describing an image with text takes longer, requires more cognitive effort, and ultimately results in a different mental image than the original visual. Adding the time dimension, as in video, further amplifies these differences.
The speaker contends that human innovation has consistently moved towards richer and more intuitive ways of exchanging information, citing radio, TV, the internet, VR, social media, and now AI as evidence. He points to the explosive growth of video-centric platforms like TikTok as proof that people inherently prefer watching and listening. The prevalence of video and audio in various apps reinforces this trend. His core thesis is that the more we consume video, the more bored we become with text. He personally gravitates towards YouTube, TikTok, and podcasts for learning and entertainment, only resorting to reading books when deeply invested in a subject.
He acknowledges a sense of guilt associated with preferring video and audio over traditional reading, and addresses common criticisms about younger generations' reduced attention spans. He offers an alternative perspective, suggesting that people may simply be tired of overly dense and slow information, and are becoming more discerning consumers of quality and concise content due to the abundance of choices available. He questions whether the problem lies with people or with the limitations of text itself, acknowledging the continued prevalence of reading but across a wider array of sources. He also reflects on the psychological value society places on the written word, even in an increasingly visual world.
The speaker attributes the continued reliance on text to the cost, both in time and money, of producing high-quality video content. He believes AI will revolutionize this by enabling both speed, scale, accuracy, and engagement in content creation. AI can generate highly photorealistic digital content, opening the doors to a new wave of creativity driven not by Hollywood, but by Youtubers and other individuals with great ideas.
The speaker highlights his company's work on AI avatars, digital humans that can communicate in multiple languages and are becoming increasingly indistinguishable from reality. These technologies will eliminate the need for cameras and remove traditional barriers to content creation. With AI, everyone will be able to be a director, producing Hollywood-grade videos without training.
AI-generated content is currently in a "bridge genre" phase, mimicking traditional media formats. However, combining AI video with reasoning systems like language models will unlock entirely new types of interactive and personalized media. Education will be hyper-personalized, entertainment will be shaped by viewers and the world around them, and new formats like interactive films and never-ending TV series will emerge. Social media feeds, with their endless streams of personalized content, offer a glimpse into this future. Integrating these technologies with AR, VR, and brain-computer interfaces will blur the lines between media and reality.
He concludes by acknowledging the ethical, political, design, and commercial questions raised by these advancements, presenting provocative questions about the value of AI-generated content, the nature of trust, and the future of human-computer interaction. He urges the audience to consider what kind of future they want to build, expressing confidence in the technology and the people to create an awesome future.