TED - Will AI Make Us the Last Generation to Read and Write? | Victor Riparbelli | TED

发布时间：2025-02-26 12:00:12 原节目

以下是将内容翻译成中文：本次演讲提出，虽然文本目前无处不在，但人类正处于人工智能赋能的通信新纪元的开端，最终将用更直观的媒体形式（如音频、视频和沉浸式技术）取代文本。演讲者出于对媒体和技术毕生的热爱，认为未来几代人会将读写视为历史文物，就像纸莎草卷轴或象形文字一样。演讲者的个人经历塑造了这种观点。他回忆起自己对阅读的热爱以及互联网的变革性影响，亲眼目睹了技术不仅改变了内容分发，还改变了内容本身。他观察到论坛与书籍的不同之处，以及音乐中的软件乐器如何催生全新的流派并使音乐制作大众化。他早期的在线体验，包括13岁时在《魔兽世界》中经营业务，巩固了他对媒体和通信领域不断发展的兴趣。一个关键时刻是发现了一篇研究论文，该论文展示了神经网络生成逼真视频的能力。这激发了一个想法，即在十年内，仅凭想象力就能在自己的卧室里制作出好莱坞品质的电影将成为现实。这个想法促成了他共同创办了一家AI视频公司，最初的愿景是让每个人都能成为好莱坞导演。虽然这个愿景仍然令人兴奋，但演讲者认为，更广泛的潜力在于AI能够通过视频和音频将从短信到企业培训材料的各种形式的内容栩栩如生。演讲者深入探讨了文本的历史，认识到它作为压缩人类沟通并在时间和空间上传达意义的方法的重要性。他指出，字母和印刷机的发明是重要的里程碑，最终导致了识字率的广泛提高。然而，他认为，文本虽然高效且可扩展，但作为信息传递的一种手段，本质上是“有损的”。它缺乏面对面交流中存在的语气、肢体语言和背景等细微差别，从而可能导致误解。即使添加表情符号也无法完全弥补这种缺乏细微表达的缺陷。另一方面，视觉交流被认为是一种更直观和直接的信息消费方式。用文字描述图像需要更长的时间，需要更多的认知努力，最终产生的心理图像与原始视觉效果不同。添加时间维度，如在视频中，会进一步放大这些差异。演讲者认为，人类的创新一直在朝着更丰富、更直观的信息交流方式发展，并将无线电、电视、互联网、VR、社交媒体以及现在的AI作为证据。他指出，像TikTok这样以视频为中心的平台的爆炸性增长证明了人们天生更喜欢观看和收听。各种应用程序中视频和音频的流行加强了这种趋势。他的核心论点是，我们消费的视频越多，我们就越厌倦文本。他个人倾向于使用YouTube、TikTok和播客进行学习和娱乐，只有在对某个主题有深入研究时才会求助于阅读书籍。他承认自己偏爱视频和音频而不是传统阅读的内疚感，并解决了关于年轻一代注意力下降的常见批评。他提供了一个替代视角，表明人们可能只是厌倦了过于密集和缓慢的信息，并且由于可选择的丰富性，正在成为更具洞察力的优质简洁内容消费者。他质疑问题在于人还是文本本身的局限性，承认阅读仍然很普遍，但来源范围更广。他还反思了社会在心理上对文字赋予的价值，即使在一个日益视觉化的世界中。演讲者将对文本的持续依赖归因于制作高质量视频内容的时间和金钱成本。他相信人工智能将通过在内容创建中实现速度、规模、准确性和参与度来彻底改变这一点。人工智能可以生成高度逼真的数字内容，从而为一种新的创造力浪潮打开大门，这种创造力不是由好莱坞驱动，而是由Youtuber和其他拥有伟大想法的个人驱动。演讲者重点介绍了他的公司在AI头像（可以多种语言交流并且越来越难以与现实区分的数字人类）方面的工作。这些技术将消除对摄像机的需求，并消除内容创建的传统障碍。借助AI，每个人都可以成为导演，无需培训即可制作好莱坞级别的视频。 AI生成的内容目前处于“桥梁类型”阶段，模仿传统的媒体格式。然而，将AI视频与语言模型等推理系统相结合，将释放全新的互动和个性化媒体类型。教育将实现高度个性化，娱乐将受到观众和周围世界的影响，并且将出现交互式电影和永无止境的电视剧等新格式。社交媒体feeds，以其无穷无尽的个性化内容流，让我们得以窥见未来。将这些技术与AR、VR和脑机接口相结合，将模糊媒体和现实之间的界限。他最后承认了这些进步带来的伦理、政治、设计和商业问题，提出了关于AI生成内容的价值、信任的本质以及人机交互的未来的挑衅性问题。他敦促观众思考他们想要构建什么样的未来，并对技术和创造美好未来的人们充满信心。

This presentation argues that while text is currently ubiquitous, humanity is on the cusp of a new era of AI-enabled communication that will ultimately replace text with more intuitive forms of media like audio, video, and immersive technologies. The speaker, driven by a lifelong passion for media and technology, believes that future generations will view reading and writing as historical artifacts, much like papyrus scrolls or hieroglyphs. The speaker’s own experiences shaped this perspective. He recounts his love for reading and the transformative impact of the internet, witnessing firsthand how technology changed not just content distribution, but the content itself. He observed how forums differed from books, and how software instruments in music led to entirely new genres and democratized music production. His early online experiences, including running a business within World of Warcraft at age 13, solidified his interest in the evolving landscape of media and communication. A pivotal moment came with the discovery of a research paper demonstrating the capability of neural networks to generate photorealistic video. This sparked the idea that creating Hollywood-quality films from one's bedroom, with only imagination as the limit, would become a reality within a decade. This idea led to the co-founding of an AI video company, with the initial vision of empowering everyone to become a Hollywood director. While this vision remains exciting, the speaker argues that the broader potential lies in the ability of AI to bring every form of content, from text messages to corporate training materials, to life through video and audio. The speaker delves into the history of text, recognizing its importance as a method for compressing human communication and conveying meaning across time and space. He notes that the invention of the alphabet and the printing press were significant milestones, eventually leading to widespread literacy. However, he argues that text, while efficient and scalable, is inherently "lossy" as a means of information transfer. It lacks the nuances of tone, body language, and context that are present in face-to-face communication, leading to potential misinterpretations. Even the addition of emojis fails to fully compensate for this lack of nuanced expression. Visual communication, on the other hand, is presented as a more intuitive and immediate way of consuming information. Describing an image with text takes longer, requires more cognitive effort, and ultimately results in a different mental image than the original visual. Adding the time dimension, as in video, further amplifies these differences. The speaker contends that human innovation has consistently moved towards richer and more intuitive ways of exchanging information, citing radio, TV, the internet, VR, social media, and now AI as evidence. He points to the explosive growth of video-centric platforms like TikTok as proof that people inherently prefer watching and listening. The prevalence of video and audio in various apps reinforces this trend. His core thesis is that the more we consume video, the more bored we become with text. He personally gravitates towards YouTube, TikTok, and podcasts for learning and entertainment, only resorting to reading books when deeply invested in a subject. He acknowledges a sense of guilt associated with preferring video and audio over traditional reading, and addresses common criticisms about younger generations' reduced attention spans. He offers an alternative perspective, suggesting that people may simply be tired of overly dense and slow information, and are becoming more discerning consumers of quality and concise content due to the abundance of choices available. He questions whether the problem lies with people or with the limitations of text itself, acknowledging the continued prevalence of reading but across a wider array of sources. He also reflects on the psychological value society places on the written word, even in an increasingly visual world. The speaker attributes the continued reliance on text to the cost, both in time and money, of producing high-quality video content. He believes AI will revolutionize this by enabling both speed, scale, accuracy, and engagement in content creation. AI can generate highly photorealistic digital content, opening the doors to a new wave of creativity driven not by Hollywood, but by Youtubers and other individuals with great ideas. The speaker highlights his company's work on AI avatars, digital humans that can communicate in multiple languages and are becoming increasingly indistinguishable from reality. These technologies will eliminate the need for cameras and remove traditional barriers to content creation. With AI, everyone will be able to be a director, producing Hollywood-grade videos without training. AI-generated content is currently in a "bridge genre" phase, mimicking traditional media formats. However, combining AI video with reasoning systems like language models will unlock entirely new types of interactive and personalized media. Education will be hyper-personalized, entertainment will be shaped by viewers and the world around them, and new formats like interactive films and never-ending TV series will emerge. Social media feeds, with their endless streams of personalized content, offer a glimpse into this future. Integrating these technologies with AR, VR, and brain-computer interfaces will blur the lines between media and reality. He concludes by acknowledging the ethical, political, design, and commercial questions raised by these advancements, presenting provocative questions about the value of AI-generated content, the nature of trust, and the future of human-computer interaction. He urges the audience to consider what kind of future they want to build, expressing confidence in the technology and the people to create an awesome future.