So I'm excited to share a few spicy thoughts on artificial intelligence. But first, let's get philosophical. By starting with this quote by Voltaire on 18th century alignment to philosopher who said common sense is not so common. Turns out this quote couldn't be more relevant to artificial intelligence today.
Like that AI is an undeniably powerful tool, beating the world class gold champion, asian college of emission test, and even passing the bar exam. I'm a computer scientist of 20 years and I work on artificial intelligence. I am here to demystify AI.
So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion of words. Such extreme scale AI models often referred to as large language models, appears to demonstrate sparks of AGI artificial general intelligence, except when it makes small silly mistakes, which it often does.
So there are three immediate challenges we face already at the societal level. First, extreme scale AI models are so expensive to train and only few tech companies can afford to do so. So we already see the concentration of power. But what's worse for AI safety?
We are now at the mercy of those few tech companies because researchers in the larger community do not have the means to truly inspect and dissect these models. And let's not forget their massive carbon footprint and the environmental impact.
And then there are these additional intellectual questions. Can AI without robust common sense be truly safe for humanity? And is brute versus scale really the only way and even the correct way to teach AI?
So I'm often asked this day whether it's even feasible to do any meaningful research without extreme scale compute. And I work at a university and nonprofit research institute. So I cannot afford a massive GPU farm to create enormous language models. Nevertheless, I believe that there's so much we need to do and can do to make AI sustainable and humanistic. We need to make AI smaller to democratize it and we need to make AI safer by teaching human norms and values.
Perhaps we can draw on analogy from David and Goliath. Goliath being the extreme scale language models and sick inspiration from an all-time classic The Art of War. Which tells us in my interpretation, know your enemy, choose your battles and innovate your weapons.
Let's start with the first, know your enemy. Which means we need to evaluate AI with scrutiny. AI is passing the bar exam. Does that mean that AI is robust at common sense? You might assume so, but you never know.
So suppose I left five clothes to dry out in the sun and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT4, the newest greatest AI system says 30 hours. Not good. A different one. I have 12 liter jog and six liter jog and I want to measure six liters. How do I do it? Just use the six liter jog, right? GPT4 speeds out some very elaborate nonsense.
Step one, fill the six liter jog. Step two, pour the water from six to 12 liter jog. Step three, fill the six liter jog again. Step four, very carefully, pour the water from six to 12 liter jog. And finally, you have six liters of water in the six liter jog that should be empty by now. Okay, one more. To die out of flat tire by bicycling over a bridge that is suspended over, nails, screws, and broken glass. Yes, highly likely. GPT4 says. Presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch these sharp objects directly.
AI today is unbelievably intelligent and then shocking you stupid. It is unavoidable side effect of teaching AI through brute versus scale. Some scale optimists might say, don't worry about this, all of this can be easily fixed. By adding similar examples as yet more training data for AI.
But the real question is this, why should we even do that? You are able to get the correct answers right away without having to train yourself with the similar examples. Children do not even read a trillion of words to acquire such basic level of common sense.
So this observation leads us to the next wisdom, choose your battles. So what fundamental questions should we ask right now? And tackle today in order to overcome this status quo with extreme scale AI.
I'll say common sense is among the top priorities. So common sense has been a long standing challenge in AI. To explain why, let me draw on analogy to dark matter. So only 5% of the universe is normal matter that you can see and interact with.
And the remaining 95% is dark matter and dark energy. Dark matter is completely invisible, but scientists speculate that it's there because it influences the visible world, even including the trajectory of light. So for language, the normal matter is the visible text.
And the dark matter is the unspoken rules about how the world works, including neither physics and folk psychology, which influence the way people use and interpret language.
暗物质就是关于世界运行规律的暗示,既包括物理学又包括民间心理学,这影响了人们使用和解释语言的方式。
So why is this common sense even important? Well, in a famous thought experiment proposed by Nick Bostrom, AI was asked to produce and maximize the paper clips. And that AI decided to kill humans to utilize them as additional resources to turn you into paper clips. Because AI didn't have the basic human understanding about human values.
Now writing a better objective in equation that explicitly states do not kill humans will not work either because AI might go ahead and kill all the trees. Thinking that's perfectly okay things to do. And in fact, there are endless other things that AI obviously shouldn't do while maximizing paper clips, including don't spread the fake news, don't steal, don't lie, which are all part of our common sense understanding about how the world works.
However, the AI field for that case has considered common sense as a nearly impossible challenge. So much so that when my students and colleagues and I started working on this several years ago, we were very much discouraged. We've been told that it's a research topic of 70s and 80s.
Shouldn't work on it because it will never work. In fact, don't even say the word to be taken seriously. Now, fast for this year, I'm hearing, don't work on it because Chachi P.T. Chachi has almost solved it. And just the scale things up and magic will arise and nothing else matters.
So my position is that giving true common sense, human-like, robust common sense to AI is a still moonshot. And you don't reach to the moon by making the tallest building in the world one inch taller at a time. Extreme scale AI models do acquire on ever more increasing amount of common sense knowledge.
I'll give you that. But they remember, they still stumble on such trivial problems that even children can do. So AI today is awfully inefficient. And what if there's an alternative path? A path yet to be found. A path that can build on the advancements of deep neural networks, but without going so extreme with the scale.
So this leads us to our final wisdom, innovate your weapons. In the modern day AI context, that means innovate your data and algorithms. OK, so there are roughly speaking three types of data that modern AI is trained on. Raw web data, craft-tid examples, custom developed for AI training, and then human judgments also known as human feedback on AI performance.
If the AI is only trained on the first type, raw web data, which is freely available, it's not good because this data is loaded with racism and sexism and misinformation. So no matter how much of it you use, garbage in and garbage out. So the newest greatest AI systems are now powered with the second and third types of data that are crafted and judged by human workers.
It's analogous to writing specialized textbooks for AI to study from, and then hiring human tutors to give constant feedback to AI. These are proprietary data by and large, speculated to cost tens of millions of dollars. We don't know what's in this, but it should be open and publicly available so that we can inspect and ensure to support diverse norms and values.
So for this reason, my team's at UW and AI has been working on common sense knowledge graphs as well as moral norm repositories to teach AI basic common sense norms and morals. Our data is fully open so that anybody can inspect the content and make corrections as needed because transparency is the key for such an important research topic.
Now let's think about learning algorithms. No matter how amazing large language models are, by design, they may not be the best-suited to serve as reliable knowledge models, and these language models do acquire a vast amount of knowledge, but they do so as a byproduct as opposed to direct learning objective. Resulting in unwanted decide effects such as hallucinated effects and lack of common sense.
Now in contrast, the human learning is never about predicting which word it comes next, but it's really about making sense of the world and learning how the world works. Maybe AI should be taught that way as well.
So as a quest toward more direct common sense knowledge or position, my team has been investigating potential new algorithms including symbolic knowledge installation that can take very large language model as shown here that I couldn't fit into the screen because it's too large.
And crunch that down to much smaller common sense models using deep neural networks. And in doing so, we also generate algorithmically human inspectable symbolic common sense knowledge representation so that people can inspect and make corrections and even use it to train other neural common sense models.
More broadly, we have been tackling this seemingly impossible, giant puzzle of common sense ranging from physical, social and visual common sense to theory of minds, norms and models. Each individual piece may seem quirky and incomplete, but when you step back, it's almost as if these pieces weave together into tapestry that we call as human experience and common sense.
We're now entering a new era in which AI is almost like a new intellectual species. With unique strengths and weaknesses compared to humans, in order to make this powerful AI sustainable and humanistic, we need to teach AI common sense, norms and values. Thank you.
Look at that. We obviously all really want this from whatever's coming. But how do we understand? So we've had this model of a child learning. How does a child gain common sense apart from the accumulation of more input and some human feedback? What else is there? Fundamentally, there are several things missing, but one of them is, for example, the ability to make hypothesis and make experiments, interact with the world and develop these hypothesis.
We abstract away the concepts about how the world works and then that's how we truly learn as opposed to today's language model. Some of them is really not there quite yet. You use the analogy that we can't get to the moon by extending a building a foot at a time, but the experience that most of us have had of these language models is not a foot at a time.
It's like the sort of breathtaking acceleration. Are you sure that given the pace at which those things are going? Which each next level seems to be bringing with it? What feels kind of like wisdom and knowledge? I totally agree that it's remarkable how much it is scaling things up really enhances the performance across the board.
So there's real learning happening due to the scale of the compute and data. However, there's a quality of learning that's still not quite there and the thing is we don't yet know whether we can fully get there or not just by scaling things up. And if we cannot, then there's this question of what else.
And then even if we could, do we like this idea of having very, very extreme scale AI models that only few can create an own? If opening AI, we're interested in your work, we would like you to help improve our model. Can you see any way of combining what you're doing with what they have built?
Certainly, what I envision will need to build on the advancements of deep neural networks. And it might be that there's some scale, goal, deluxe, zones such that I'm not imagining that the smaller is the better, either, by the way. It's likely that there's a right amount of a scale, but beyond that, the winning recipe might be something else. So some synthesis of ideas will be critical here.