The Future of Software Development

发布时间 2017-07-15 17:59:41 来源

摘要

Against the backdrop of compute moving increasingly into all kinds of things we never imagined could be considered "computers," -- your phone, your thermostat -- Chris Granger, Co-Founder of Eve, describes how software development is evolving to keep pace with the change. Granger's talk was part of the 2014 a16z Academic Roundtable.

GPT-4正在为你翻译摘要中......

中英文字稿

All right. So I'm going to talk a little bit more about the software side of things, right? And it's kind of interesting the way this played out, in that I think my talk is actually going to bring a lot of the sort of discussion we've had so far together into something interesting. And so we talk about these major trends in computing, things like big data and deep learning and mobile and so on and so forth. But sort of under the covers, there's a set of things that are starting to change that we don't seem to talk about.

好的。我接下来会多谈谈软件方面的内容。这件事的发展方式其实挺有趣的，因为我觉得我的演讲会把我们之前讨论的一些话题结合成一个有趣的东西。我们谈论计算领域中的一些主要趋势，比如大数据、深度学习、移动应用等等。但是在这些表面之下，有一些正在发生变化的事情，似乎我们并没有太多讨论。

One of these, and it's sort of been brought up earlier, is that over the past 15 or so years, the 1970s computer disappeared. We have a new notion of the machine. The machine is this. We keep seeing this picture over and over again. It's a gigantic computer with thousands of cores, with terabytes of memory, with petabytes of disk. But it is also this, right? And even now this, our thermostats are now part of this giant machine that we're working with.

其中一点是，这在之前有被提到过，就是在过去大约15年里，1970年代的计算机已经消失了。我们对机器有了新的概念。这个机器的概念是这样的：我们反复看到这样一幅图像——一个拥有数千个核心、数万亿字节内存、数千万亿字节存储的大型计算机。但它也是这样的，对吧？即使是现在，我们的恒温器也成了我们正在运作的这个巨大机器的一部分。

And to an ever-lessening degree, it is our laptops. And in a world where that's the case, all of a sudden the game starts to change, right? Performance starts to matter again, right? We used to rely on Moore's law. It'll be fine. We'll just wait a couple, you know, six months, and it'll be fast enough and we'll do well. But the truth is, we're trying to squeeze, you know, every ounce of power out of these little guys in our pockets, right?

在我们生活中，笔记本电脑的重要性逐渐降低。在这样一个世界中，情况开始发生变化，对吧？性能又开始变得重要了，以前我们依靠摩尔定律，觉得没问题，只要等上比如半年，设备就会变快，我们就能很好地使用。但事实上，我们现在试图从口袋里的这些小设备中榨取出每一寸性能，对吧？

And on top of that, we care about battery life. So we just need general efficiency and computation. And then on top of performance, now everything is highly, highly networked. And that means our systems are now full of latency, full of disorder. And we have to deal with failure at any one point in time. And the truth is, our models for computation never really included those eventualities, right?

在此之外，我们还关心电池寿命。所以我们需要的是整体的效率和计算。另外，在性能之上，现在一切都是高度联网的。这意味着我们的系统充满了延迟和混乱。我们必须随时应对故障。事实上，我们的计算模型从未真正考虑过这些情况，对吧？

We sort of assumed that, hey, that 1970s computer with just a single CPU would be fine for us. And when I start talking to this, I'm talking about this stuff with other people, they're like, ah, performance. I know what I do. I'm going to pull out C, and I'm going to write some code. And I know that's fast, because it's always been fast and it will be fine.

我们有点想当然地认为，嘿，那台1970年代只有一个CPU的电脑对我们来说应该足够用了。然后当我和其他人谈论这类事情时，他们会说，啊，性能问题。我知道我应该怎么做，我会使用C语言写一些代码。我知道这样很快，因为它一直以来都很快，这样就可以了。

But honestly, in industry, we are struggling to write even simple programs against this new kind of machine just because our models don't fit it, right? And so that's taken us to something, made something very important start to happen. And we sort of touched on this before, which is that industry and research are starting to come back together again, right? You go back to the 70s, there kind of was no distinction between the two. They were basically the same.

老实说，在工业界，我们在为这种新型机器编写程序时甚至感到困难，仅仅是因为我们的模型不适用，对吧？这已经引发了一些非常重要的事情开始发生。我们之前也稍微提到过，就是工业界和研究界开始重新走到一起，对吧？回到70年代，两者之间几乎没有区别，基本上是一回事。

But over the years, big corporations started sort of pushing the agenda, right? And then as a result of some failures in the 80s, like fifth generation computing, we kind of lost faith in sort of computer research. And so what's interesting about that is we kind of did this like depth first traversal of computation, right? We sort of ran with whatever the industry had at the time.

这些年来，大型企业开始在某种程度上推动这个议程，对吧？然后，因为在80年代的一些失败，比如第五代计算机，我们有点对计算机研究失去了信心。有趣的是，我们像是做了一个深度优先的计算遍历，对吧？我们基本上是跟随当时行业已有的东西进行发展的。

And so the result, of course, is that every mainstream language kind of is just some form of object oriented programming. And every database eventually starts to look like SQL. And we deal with concurrency through this horrid mess of locks and sadness. And we kind of got to the bottom of this tree. And we're looking around and starting to realize, none of these are real great options.

结果当然是，每一种主流编程语言都在某种程度上采用了面向对象编程的形式。每种数据库最终也都开始看起来像SQL。而当我们处理并发时，却得面对这种混乱不堪的锁机制，让人感到心累。我们似乎已经走到了这棵树的底部，环顾四周，开始意识到这些选择都不是真正理想的方案。

And I would argue the reason that this happened is that pure engineering makes this very pragmatic trade off, which is that you tame complexity by adding a layer of abstraction. And this is like stacking teacups on top of teacups, right? You can do this for a little while. But eventually, those teacups, that stack, starts to lean. And it threatens to fall over. And I would argue that it's thanks to big data that we hit this big wall.

我认为之所以会出现这种情况，是因为纯工程学在处理问题时做出了一个非常务实的权衡：通过增加抽象层来驾驭复杂性。这就像是在一个一个地叠茶杯。你可以这样做一段时间，但最终这些茶杯的叠加会开始倾斜，面临倒下的危险。我认为正是由于大数据的缘故，我们才碰到了这个大的瓶颈。

We're like, great, we have Hadoop, but it's not getting us as far as we want. We have to sort of rethink about just doing this pure engineering stuff. Maybe we need to go back to first principles, right? And of course, this is where industry and research start to come back together again. And so the most interesting advances happening right now are coming from people like Alan Stuika, right, out of Berkeley who built Spark.

我们心想，太好了，我们有了Hadoop，但它并没有达到我们期望的效果。我们需要重新考虑这些纯工程技术，可能我们需要回归到基本原理，对吧？当然，这也是工业界和研究界再次结合的地方。因此，现在最有趣的进展来自于像Alan Stuika这样的人的工作，他是来自伯克利的人，构建了Spark。

We're looking back towards research to find a path forward. And the reason I think that that is so important is because what we're really trying to get back to is simplicity, right? Instead of adding more teacups on top of this thing, wipe all the teacups away and find the foundational principles that we need in order to embrace this new machine, right? Gain power through simplicity, not more complexity.

我们正在回顾研究，寻找前进的道路。我认为这非常重要的原因是，我们真正想回归的是简单，对吧？与其在这个东西上堆积更多的茶杯，不如把所有茶杯都移开，找到我们所需要的基础原则，以便接受这台新机器，对吧？通过简单而不是更多的复杂性来获得力量。

And it is the folks in this room, right, who are doing exactly this. Who are looking for sort of the foundation upon which we could actually start to build this new way of thinking about software and computation in general. And so I thought it'd be interesting to go through a couple of examples and distributed computing is like all the rage these days, so I'll start there, right? And I think one of the more interesting things happening there is what are called commutative replicated data types. So if you have a big distributed system and you have lots of information constantly changing, you want every node in that system to have the same information or something like it, you have this problem with coordination, right? You need to coordinate all of these changes so that everyone has them. And sort of the traditional way of thinking about this is, oh, great, we've done transactions as a coordination mechanism. We'll do distributed transactions. And so we created a bunch of really complicated mechanisms, a bunch of really complicated coordination protocols like Paxos to try and handle this.

这正是房间里的各位正在做的工作。你们正在寻找一个基础，以便能够开始构建一种全新软件和计算方式的思维模式。我认为可以通过几个例子来说明，尤其是近来非常流行的分布式计算。所以我会从这里开始。我认为其中一个更有趣的进展是所谓的可交换复制数据类型。在一个大型分布式系统中，如果有大量信息在不断变化，你希望系统中的每个节点都拥有相同的信息或类似的信息，你就会面临协调的问题。你需要协调所有的变化，以确保每个人都获得这些信息。传统的思维方式是使用事务作为协调机制，因此人们开始在分布式系统中使用事务。为此，我们创建了一些非常复杂的机制，比如使用像Paxos这样的复杂协调协议来处理这个问题。

But there's, of course, a couple of problems with that. One, like I said, they're really complicated. They're very hard to get right. But two, and even more importantly, they're really hard to reason about, right? And the reason why CRDTs are really interesting is because it's not real important exactly the mechanism. But what they manage to do is just sidestep the problem entirely. They say, no, no, no. Here's a data type that can update regardless of order. OK, that's infinitely simpler than what we were doing before because it just sidesteps the issue of coordination entirely. And I like this example because this is a great example of the pure academic approach towards this problem, which is find out the mathematical principles that prove that this is a wonderfully simple idea and find out how you put that into practice, right?

当然，这里面有几个问题。首先，正如我所说的，它们真的很复杂，很难做到完全正确。但更重要的是，它们真的很难进行推理。CRDTs（冲突自由复制数据类型）之所以有趣，是因为其核心并不在于具体机制。他们所做的就是完全避开了问题本身。他们说，不，不，不。这是一种可以无序更新的数据类型。好吧，这比我们之前做的事情简单多了，因为它完全避开了协调的问题。我喜欢这个例子，因为它很好地展示了纯粹的学术方法解决问题的案例，也就是找到能证明这是一个绝妙的简单想法的数学原理，并弄清楚如何将其付诸实践。

Another great example of this is also wonderfully simple. You were starting to see a lot more systems built on what are basically append-only event logs, immutable event logs. Because we, again, we have this problem in a distributed system. We learned very quickly that shared, mutable state is a disaster in that world. And then we have to do something else. And there's nothing simpler than just keeping a log of every single thing that ever came into the system, right? And then there's sort of interesting engineering challenges of then taking that log and turning that into the state of the system at any one point in time. And we're seeing really neat stuff, like great, great research coming out of, for example, the University of Auckland doing a project called Octopus DB, where they reimagine the database based on this thing. And other projects like Apache Kafka and Apache SAMHSA are all based purely on, let's make really efficient distributed system software based on just keeping a list of everything that's ever happened.

另一个很好的例子也相当简单。你会发现越来越多的系统是基于基本上仅可追加的事件日志、不变的事件日志构建的。这是因为在分布式系统中，我们很快就意识到共享的可变状态会导致问题。因此，我们必须采取其他方法。保持一个包含系统中所有事件的日志，没有比这更简单的了。而从这些日志中提取信息，实时转换为系统当前状态，又带来了一些有趣的工程挑战。我们看到一些非常精彩的创新，比如奥克兰大学的一个项目叫Octopus DB，他们基于这种理念重新构建了数据库。还有像Apache Kafka和Apache SAMHSA这样的项目，都是基于记录所有发生事件列表来开发高效的分布式系统软件。

Now, I started with distributed systems, but this applies to far more than just distributed systems, right? And a completely different vein. You have folks like Facebook building these incredibly complicated user interfaces, you know, meant to be used by lots of people. And they're sort of running against this wall where they can't make them fast enough, they can't make them work on our phones. And on top of that, they just can't reason about them anymore, right? There's so many things on the screen that they're having a really hard time. And so they went back to a really old idea, what's called immediate mode UI. So traditionally, when you build some user interface, you have a button, right? And that button has a bunch of state underneath it. It has a background and it has a hover state and blah, blah, blah. And you mutate that state in order to sort of change the UI.

现在，我一开始关注的是分布式系统，但这不仅仅适用于分布式系统，对吧？从一个完全不同的角度来看，像Facebook这样的公司在构建非常复杂的用户界面，这些界面是供大量用户使用的。他们遇到了一个瓶颈，那就是无法让界面快速运行，也无法让它们在手机上正常工作。除此之外，他们发现自己已经无法再理清这些界面的逻辑了。屏幕上有太多的内容，他们处理起来非常困难。因此，他们回归到了一个非常古老的想法，叫做“即时模式UI”。传统上，当你构建用户界面时，比如有一个按钮，这个按钮下有很多状态，比如背景、悬停状态等等。你需要通过改变这些状态来更新和改变界面。

Media mode is much simpler idea. You just redraw the UI every single frame, right? And it turns your sort of interface code into a pure function of the state of the application. And all of a sudden, when you're in a world like that, it's stupidly easy to understand exactly what you're going to get. As long as you know what's on the left side, you know what you're going to get out on the right side. It's much easier to reason about, interestingly enough, it's a lot easier to make fast.

媒体模式是一个更简单的概念。你只需在每一帧重新绘制用户界面，对吧？这会让你的界面代码变成应用程序状态的纯函数。突然之间，在这样的环境中，你很容易理解将得到什么结果。只要你知道输入是什么，就能清楚地知道输出是什么。这种方式不仅更容易理解，而且有趣的是，它还更容易实现快速运行。

And so we have immediate mode UI abstractions over the DOM in HTML. Another fun example, right? Constraint programming is sort of making a comeback from the old days. Sat solvers and SMT solvers are getting ridiculously fast, like unbelievably fast. And again, what could possibly be simpler than just writing out a set of constraints and having a system solve it for you, right? And it's neat because this is getting applied in some interesting ways.

因此，我们在HTML中对DOM进行了即时模式用户界面抽象。又是一个有趣的例子，对吧？约束编程似乎正在从旧时代重新流行起来。SAT求解器和SMT求解器的速度快得令人难以置信，真是非常快。而且，想想看，还有什么比直接写出一组约束然后让系统帮你解决更简单的呢？这很棒，因为这开始在一些有趣的方式中得到了应用。

So for example, Apple's AutoLayout and iOS is based on a linear inequality constraint solver called Casuary, right? We're starting to see people realize, hey, we learned pretty early on that, you know, query optimizers as an example can write much better queries than human beings can. Why not apply that to other kinds of searches? And so we're seeing constraint programming sort of make a comeback.

所以，例如，苹果的AutoLayout和iOS是基于一个叫做Cassowary的线性不等式约束求解器，对吧？我们开始看到有人意识到，嘿，我们很早就了解到，查询优化器作为一个例子，可以比人类写出更好的查询。那为什么不将这种方法应用于其他类型的搜索呢？因此，我们看到约束编程正在重新流行起来。

And the last one, I'm particularly interested in this one, which is there's a group of people, you know, quietly saying, let's get rid of SQL and let's bring back the relational database part. And they're starting to look at this because they're wondering whether or not stone breakers really write, right? Whether or not you can create a general purpose database that is as fast as the specific ones or could at least, you know, compete with them in some interesting way. And we're seeing this in sort of the re-emergence of data log, a language that has been dead for 40 years based on, you know, came about at the same time prologue did, right? But it's based on this idea of relational, you know, having a relational database without the SQL. And we're starting to see this, you know, get into industry.

最后一个，我对此特别感兴趣，就是有一群人正在悄悄地说，我们把 SQL 去掉，然后重新引入关系数据库部分。他们开始考虑这个问题，因为他们在怀疑 Stonebraker 是否真的正确，是否能够创建一个通用数据库，可以和专用数据库一样快速，或者至少能够以某种有趣的方式与之竞争。我们看到这一点在某种程度上表现为 DataLog 的复兴。这是一种已经沉寂了 40 年的语言，与 Prolog 同时出现，但基于没有 SQL 的关系数据库这个理念。我们开始看到这种趋势进入到业界。

People like Rich Hickey built an entire database called the Tomic built on data log, right? And I could keep going on and on. There's, you know, tons more examples here. But the interesting thing is all of these things sort of relate to those trends I was talking about, sort of these currents that I think are going to be really important as we move forward. The first, like I said, is the industry and research are coming back together again, right? And maybe that means that we can go back and look at all of those branches of the depth first search that we skipped, right? The branches of that tree that we just neglected.

像Rich Hickey这样的人创建了一个名为Tomic的数据库，它是基于数据日志构建的，对吧？我可以继续举出很多例子。但有趣的是，所有这些事情都与我之前提到的趋势相关联，这些趋势在未来将变得非常重要。首先，如我所说，行业和研究正在重新结合，对吧？这可能意味着我们可以回过头去查看深度优先搜索中我们跳过的所有分支，那些我们忽略的树的分支。

And that's being driven by the fact that we have this new machine and we don't know what to do with it yet. You know, we can't program like we did in the 1970s, you know, C is not going to save us. You can't just stick the actor model on top of it and hope it's going to work. We need fundamentally different ways of thinking about computation. And the way we're going to find those is by making a trip back to simplicity, right? Trying to find sort of the fundamental truths that we can use to build the system this way.

这主要是因为我们有了这台新机器，但还不知道如何使用它。我们不能像上世纪70年代那样去编程，C语言已无法解决问题。也不能简单地在上面应用Actor模型并指望它能正常工作。我们需要从根本上改变对计算的思考方式。而找到这些新方法的途径，是回归简单，寻找能够用来构建这种系统的基本原理。

And there's one more that I haven't really talked about yet, but sort of stems from all of these, which is that if we actually do manage to come up with a simpler version of programming, it offers us the opportunity to maybe finally have a shot at democratizing computation, right? You know, Peter was talking about, hopefully at some point everyone has access to these really powerful machine and machine learning and predictive capabilities. Let's take a first step and get everyone just the ability to do really simple computation, which is not possible right now because, hey, programming is hard, right? But if we do manage to simplify the system and we actually get it to the point where it works the way we need it to, maybe we do have a shot at that, you know, have a shot at that.

还有一点我还没谈到，但是与前面所说的都有些关系，那就是如果我们真的能推出一个更简单的编程版本，它可能给我们带来一个机会，最终实现计算的普及。就像彼得所说，希望在某个时刻，每个人都能访问这些强大的机器和机器学习及预测功能。我们可以先迈出第一步，让每个人都能进行简单的计算，而这在现在并不可能，因为编程很难，对吧？但如果我们能够简化系统，让它达到我们所需的工作状态，或许我们真的有机会实现这个目标。

And the implications for the people in this room are pretty tremendous because that means the computer turns into a tool again. And then we start thinking about things a little bit differently. And so there you go. So these are the things sort of that I believe are sort of shaping the way things are moving forward. And like I said, you know, some of these efforts are pretty quiet, but things like CRDTs are actually starting to make their way into industry, right? So the latest implementation of React, React 2.0, actually has a set of CRDT data types inside of it.

这对在座的人来说意义重大，因为这意味着电脑再次成为一种工具。我们也因此开始以不同的方式思考问题。这些都是我认为正在塑造未来发展方向的事情。正如我所说，有些发展可能比较低调，但像CRDTs这样的技术正在逐渐进入工业界。例如，最新的React版本，React 2.0，就已经包含了一组CRDT数据类型。

And we're going to see these start to eat more and more and more into the system. As we try and find ways to work against that new machine.

我们将会看到这些东西逐渐越来越多地侵入系统。当我们试图寻找方法来对抗那台新机器时。

And it's those trends that are ultimately going to, you know, reshape the way we think about computation as a whole.

这些趋势最终将改变我们对整体计算方式的看法。

And then how we actually wield it to solve our problems. And that's all I got.

然后，我们如何实际运用它来解决问题。这就是我的全部内容。