首页  >>  来自播客: User Upload Audio 更新   反馈

NEW Hugging Face Agents — First Look - YouTube

发布时间 2023-05-12 16:00:00    来源
Hooking face have just announced something that I think is probably going to be a very major thing in the future of large lounge models and NLP. And that is their spin on agents for large lounge models and transformers in general.
Hooking Face刚刚宣布了一些我认为在大型休息室模型和NLP的未来中可能是非常重要的事情。他们关注的重点是大型休息室模型和transformers的代理人。

Now there's quite a few reasons as to why I think hooking face are in a very good position to offer possibly one of the best agents and tool frameworks out there. And I'm going to discuss so first for those of you that haven't heard of these things before, I just kind of not sure what they are.
现在有很多原因让我认为 Hooking Face(一种工具框架)处于一个非常好的位置,能够提供可能是最好的代理和工具框架之一。我将讨论一下这些原因,首先是为那些之前没有听说过这些东西,不确定它们是什么的人。

Let me quickly explain why an agent and what a tool is. So we know what large lounge models are. They are big transformer models that can basically answer questions in natural language for us based on some natural language input.
让我简单解释一下代理和工具的概念,以便我们能够了解大型语言模型是什么。它们是大型转换器模型,可以根据一些自然语言输入,为我们回答自然语言问题。

A agent kind of takes this and takes it a little bit further and expands these LMs out to basically allow them to have multiple steps of reasoning and thoughts so they can think to themselves. And this is ideal for when we want to integrate what are called tools.
代理人将这一点再进一步发展,扩展这些语言模型,使它们能够拥有多步推理和思考,因此它们能够独立思考。当我们想要整合所谓的工具时,这是理想的。

So what we can do is we can tell an LLM, hey, we want you to answer a question. If you can't answer by yourself, you can actually refer to some other tools that we have given to you. And you might say something like if you don't know about a particular topic, you can form a Google search in order to find out about that particular topic.
所以我们可以告诉一个法律硕士,嘿,我们想让你回答一个问题。如果你自己不知道答案,你可以参考我们提供给你的其他工具。你可以说,如果你不知道某个特定主题,可以做一个Google搜索,以了解该特定主题。

And you would also explain how we can do that. And because the LLM has this multi-set thought process, it can say, okay, I've got this question. I need to use the Google search tool. And then it will say, how do I use that Google search tool? It's going to provide some input to Google search. We would then go do a Google search for it, return some answers and pass that back into the LLM. And now all of a sudden it can answer a question. And you can do this for a ton of tools, the SQL databases, knowledge base, effective databases and so on. Python interpreters, you can do basically anything you can program. You can create a tool out of it.
你还可以说明我们如何做到这一点。因为LLM具有这种多元思维过程,所以它可以说,好的,我有这个问题,我需要使用Google搜索工具。然后它会说,我该如何使用Google搜索工具?它会向Google搜索提供一些输入。接着我们去进行Google搜索,返回一些答案并将其传回LLM。现在,它可以回答一个问题了。你可以为很多工具做到这一点,例如SQL数据库、知识库、有效的数据库等等。Python解释器,你几乎可以使用任何你可以编程的工具。

Now obviously this is a very powerful thing to be able to do. Now at the moment, by far the biggest library for using agents is Lanchane. There's also HaySack who have introduced agents recently. And there's also actually ChatGPT plugins, which are a form of agents as well, or at least it's a form of tools added to the agents, which is actually ChatGPT itself.
显然,能够使用代理是一件非常强大的事情。目前,使用代理最大的库是Lanchane。最近HaySack也引入了代理。而且,还有ChatGPT插件,它也是一种代理形式,或者至少是一种添加到代理中的工具形式,它本身就是ChatGPT。

Now HuggingFace have also jumped on the bandwagon and I think their implementation is actually very interesting and particularly powerful for a lot of different reasons. A big component of that is what HuggingFace actually is. So HuggingFace is essentially almost like a huge community and a hub of all of these different transformer models, diffusion models for generating images, data sets and just a ton of anything can think of in machine learning, HuggingFace actually cover a lot of it.
现在HuggingFace也加入了这个行列,我认为他们的实现非常有趣,而且出于很多不同的原因,尤其是非常强大。其中非常重要的一个因素是HuggingFace究竟是什么。HuggingFace本质上就像一个巨大的社区和聚集各种不同的transformer模型、生成图片的扩散模型、数据集以及在机器学习中可以想到任何东西,HuggingFace都覆盖了很多。

And their version of agents and tools are very interesting and I haven't been all the way through yet. This is kind of my first look. But the agent itself is very simple to use. It also can be used as a conversation agent. So as a chatbot where you have more ticeps in the process. And it also gives us access to all of these models on HuggingFace which I think is one of the coolest things about it.
他们的代理和工具版本非常有趣,我还没有完全理解。这是我初次看到。但是该代理很容易使用。它也可以用作对话代理人。因此,作为一个聊天机器人,您在过程中有更多的提示。它还让我们可以访问HuggingFace上的所有这些模型,我认为这是它最酷的地方之一。

But there is other things that actually let me jump into it and show you those rather than just talking through them. We'll just have a quick look at their example here. So they have basically they don't really show you anything. I'm going to go through a code in a moment. But you run agent that we're on caption of following image. And then they pass in this image through image here. And the output is a beaver's room in the water.
但有其他的事情让我真正投入并向你展示而不仅仅是通过谈话来表达。我们现在来快速看一下他们的例子。基本上,他们并没有真正展示你任何东西。接下来我将浏览一段代码。但是你运行这个代理并给它作为输入这张图片的标题,然后他们会将这张图片通过这里的图像进行传递。输出结果是一个河狸在水中的房间。

Then they also have this. So agent run read the following text out loud. And it will use a text speech model to actually do that. Can also do this. So we have like some a we see our reading this document or this image of a document. And then we say in this following document. As we ask the question. and the output is this forum for you which is down here at the bottom bottom right now.
然后他们还拥有这个功能。所以代理程序可以将以下文本大声读出来,并使用文本语音模型来实现。还可以执行此操作。因此,我们可以在阅读文档或文档图像时使用某些功能。然后,我们在提问时说出“在以下文档中”,输出结果将显示在该底部右侧的论坛上。

So the point here is that these agents are using like a ton of different models from the Transformers library. And I think this takes pretty clear inspiration from this paper called hugging GPT which essentially use chat GPT with a ton of hugging face models to do a load of cool things. So it uses same sort of approach where you have an agent which was the chat GPT model.
这里的重点是这些代理人使用了Transformer库中大量不同的模型。我认为这受到了一篇名为“抱抱GPT”的论文的启发,该论文实际上使用了大量的抱抱脸模型来实现许多很酷的功能。因此,它采用了相同的方法,其中代理人是聊天GPT模型。

And if we come down to the first image here this basically shows us how it works. We have a large language model chat GPT in this case our or GPT 3.5 turbo ice zoom. And this is like your controller. And then we have all these what we would call specialist models that can do particularly things that chat GPT is not able to do like understand what is within an image or caption and image and chat GPT or your large language model is able to basically figure out okay given a question which models you need to use what input do I need to pass them and then uses the output from those models to inform the next step of trying to figure out what it needs to do. And then provide a very cool answer that a normal large language model would not be able to do by itself.
如果我们来看这里的第一个图像,它基本上展示了这个系统是如何工作的。我们有一个名为GPT的大型语言模型聊天机器人,在这种情况下,我们使用的是我们的GPT 3.5 turbo ice zoom。这就像是您的控制器。然后,我们有所有这些我们可以称之为专家模型,它们可以做特定的事情,而聊天GPT无法做到,比如理解图像或图像注释中的内容,而您的大型语言模型基本上可以弄清楚,给定一个问题,需要使用哪些模型,我需要向它们传递什么输入,然后使用这些模型的输出来通知下一步尝试弄清楚需要做什么。然后提供一个非常酷的答案,一个普通的大型语言模型凭借自己无法做到。

Now let's take a look at a care example of how we can use this. Now initially I'm just actually using one of the examples from hugging face and then we'll just go through a few cells and then we'll do something a little more interesting. So first thing we need to do here is install a few things. So transformers which is a library that contains the agents because we're going to be using image generation and models here so the diffusion models we need to use on face diffuses and we're also going to use it to accelerate which I believe allows us to run things faster.
现在让我们来看一个使用这种方法的实际例子。一开始,我使用了 Hugging Face 中的一个示例,然后我们会通过几个单元格进行实验,并做一些更有趣的事情。首先,我们需要安装一些东西,包括 Transformers 库,这个库包含了我们要使用的待处理代理,因为我们将要使用图像生成和模型技术,所以我们需要使用扩散模型和加速技术,我相信这将使我们运行速度更快。

Now in reality and I'm not sure if this is the case I think we should hopefully be running on a GPU here okay so here I've gone into my runtime settings and just changed my hardware accelerated to GPU. And then what we do is we're just going to use OpenAI here obviously for your large language model hugging face makes it very easy to use other open source options so you can do that I'm just using this because I know it's going to work it's quick so here we are.
现在实际上我不确定是否适用,但我认为我们应该希望在GPU上运行。所以我进入了我的运行设置,只是将硬件加速改为了GPU。然后我们要做的就是在这里使用OpenAI,显然,对于大型的语言模型,hugging face使使用其他开源选项变得非常容易,因此您可以这样做,但我选择使用它是因为我知道它能够起作用并且速度快,那么现在我们就在这里了。

And I'm going to use text eventually 0-0-3 basically what I've found generally speaking is that text eventually 0-3 is usually better at following instructions for tools within agents than GPT 3.5 Turbo and that's also the model that they use that hugging face using in their examples. So I'm not 100% sure if they support GPT 3.5 Turbo yeah I'll need to try at some point.
我的意思是,我最终会使用0-0-3文本。通常情况下,我发现0-0-3文本比GPT 3.5 Turbo更好地遵循代理工具的指令,而且这也是Hugging Face在他们的示例中使用的模型。所以我不确定他们是否支持GPT 3.5 Turbo,我需要尝试一下。

So yeah all we do transformers tools import open AI agent and there are other agents as well I think there's the hugging face agent is like HF agent maybe but obviously we're just going to use OpenAI on here. You will need to add in your OpenAI API key which you can get from Splatform platform. OpenAI.com.
所以我们所做的是转换工具导入 OpenAI 代理,还有其他代理,我认为还有 hugging face 代理,可能就像 HF 代理一样,但显然我们只会在这里使用 OpenAI。您需要添加您的 OpenAI API 密钥,您可以从 Splatform 平台上获得。 OpenAI.com。 意思是说,他们使用了转换工具,导入了 OpenAI 代理,同时也有其他代理可供选择,但他们只想使用 OpenAI。他们提到为了使用 OpenAI,需要添加 API 密钥,可以从 Splatform 平台上获得。

Okay so we're going to run this this is going to initialize our agent and actually that's all we need to do. No works which is very easy and quick sell.
好的,我们将运行此程序,这将初始化代理,事实上这就是我们需要做的全部。没有问题,这非常简单和快速。

So I think all we're doing here is okay so we're downloading the tool configuration and this is a very interesting component of hugging faces tools implementation which is that we can download community contribute tools and obviously hugging faces own tools. So in the next probably very soon next few weeks we're probably going to see some pretty insane tools appear from the community which will be fascinating to see. That's one of the big components as to why I think this is going to be pretty major.
我认为我们正在进行的一切都很好,因为我们正在下载工具配置。这是hugging faces工具实现的一个非常有趣的组成部分,因为我们可以下载社区贡献的工具,当然还有hugging faces自己开发的工具。在接下来的几周内,我们可能会看到社区出现一些非常惊人的工具,这将是非常有趣的。这就是我认为这将是一件相当重要的事情的一个重要因素。

Okay and another reason I think this is going to be pretty major is that we can do more time and more adult agents super easily. But I haven't done anything here. I just initialized my agent and I said okay I want to generate an image of a boat in the water. Right and because hugging face has they have a big diffusers library which contains the loads of text to image diffusion models and they obviously have all the transform models as well.
我认为这将是相当重要的另一个原因是,我们可以很容易地做更多时间和更多成人代理。但是我在这里什么也没有做。我只初始化了我的代理,并且说:“好的,我想生成一张船在水中的图片。” 因为Hugging Face拥有一个很大的扩散器库,其中包括大量的文本到图像扩散模型,他们显然也拥有所有的转换模型。

They can integrate at least a few of those into the default agent. So if I just say generate an image of a boat in the water what is soon here is this isn't how long it takes the process. This is actually downloading the model. Okay so the image generation model. This will only happen once. Okay and I'll prove that by running it again in a moment. So that's going to download it is okay here we go so we've got an estimation from the agent. I'm going to use the following tool image generator to generate an image according to the prompt. And it generates some code here and here we go here is our image of a boat in the water.
他们可以将其中至少一些集成到默认代理中。因此,如果我只是说生成一张船在水中的图片,则很快就会出现这个过程所需的时间。这实际上是正在下载模型。好的,那么这是图片生成模型。这只会发生一次。好的,我会马上再次运行它来证明这一点。所以这将会下载它,好的,我们从代理得到了一个估计。我将使用以下工具,即图像生成器,根据提示生成一张图片。它生成一些代码,而这就是我们的一张船在水中的照片。

Okay so yeah that was super easy to do and let me just run that again. Okay you'll see that it doesn't take quite as long this time. Okay so it's generating the image and there we go right that was eight seconds which may be considering it also three seconds to generate the image that's pretty good for an agent. So that is really cool.
好的,这很容易完成,让我再进行一次。你会发现这一次不需要那么长时间。现在正在生成图像,好的,只用了8秒,考虑到还需要3秒来生成图像,对于一个代理来说,这是非常好的表现。所以这真的很酷。

And okay what we can also do is okay I have a boat image here and we can come down and do agent run again. Now I can pass in this variable okay so we use this back tick here and we can pass in a variable which we then enter its own actual variable here right so we could also just do like image okay and then that just means that we need to replace image here. Okay so let's run that.
好的,我们也可以这样做,我这里有一张船的图片,我们可以继续执行agent run。现在,我可以传递这个变量,我们使用这个反引号,然后传递一个变量,然后在这里输入它的实际变量,那么我们也可以只输入image,这意味着我们需要替换这里的image。好的,让我们执行它。

I'm just going to ask you to write a caption for this image. Again it's going to need to download the captioning model as you can see here okay and then we get a boat floating in the water with clouds in the background right let's run it again so you can see how long that actually takes okay so four seconds again very quick.
我只是要求你为这张图片写一个标题。正如你在这里看到的那样,它需要下载字幕模型,然后我们可以看到一艘浮在水面上,背景中有云彩的船。让我们再次运行它,这样你就可以看到需要多长时间了。四秒钟,非常快速。

Okay and then so here I was just looking okay what does that prompt template look like when we're doing that run method. You can see a little bit of the logic that is going on in here here so I'm going to ask you to form tasks it includes all the tools here and then it gives a few examples and then ask it to figure out what it needs to do next right this is just yeah we don't need that it just contains all the code for the agent which is another nice thing that I like about the hooking face implementation is that the code is pretty readable so if something isn't quite working the way you'd expect it to working going to code and kind of figure out why almost straight away which is not as easy to do with other libraries at the moment so that that's very nice and yeah I mean it's super cool
所以,我在这里只是看看我们在运行方法时需要哪些提示模板。你可以看到这里正在进行的一些逻辑。我要求你编写任务,其中包括所有工具,然后给出几个例子,然后让它找出需要做什么,这只是包含代理的所有代码,这也是我喜欢钩子面实现的另一个很好的事情,代码非常可读,所以如果有些地方不能像你期望的那样工作,你可以直接去代码里找到原因,这是目前其他库不容易做到的。总的来说,这太棒了!

Now let's have a look at a conversational agent so basically a chat box right so I'm going to say hey how are you okay we just get this right either I'm doing well thank you for asking cool.
现在让我们来看一下一个对话代理,基本上是一个聊天框,我会说“嗨,你好吗?”好的,我们刚刚明白了,我要么回答“我很好,谢谢关心”或者回答“很棒”。

I'm going to ask it's great image of a draft writing a skateboard and I just made this up very quickly before running this and I mean the results are not perfect they're they're interesting they're funny right so we're not running any special diffusion models here so we'll get like this we had to had a draft but you know let's stick with that and like as a the resource or entertaining and then they're not particularly impressive from a image generation point of view but it's just interesting to see.
我打算问一下一个很棒的素描写真写一个滑板的图像,我在进行这项工作之前很快就想出了这个想法。虽然结果不完美,但仍然很有趣,很有趣。我们在这里没有运行任何特殊的扩散模型,所以我们得到的是一张素描,但我们可以把它作为一个资源或娱乐内容。从图像生成的角度来看,它们并不特别令人印象深刻,但它们很有趣。

Here using a image generator model and then you come down here and it's not going to use a image generator model it's going to use an image transform model to modify the existing image and this is something that is really cool as well so okay first it needs to download that model so let me explain what is so cool here right so like you can see that it's generating some code right and this code is actually referring to the image okay and the image is generated by this code beforehand so that Python interpreter that all of this is using is maintained between chat interactions it's going to write some code and then you can say oh actually can you do something else and it's it can still interact with that code against it it can still see that code and it's going to write some more code based on what's already done which is not something that I have seen done by default in other libraries like use agents and tools so I mean that's just a really cool thing that I like and it's just insane how easy it is to get that working so cool yes and now we get this right so it's an elephant so this image transform model
利用图像生成模型进行训练,然后在这里使用图像转换模型来修改现有图像,这是非常酷的一件事情。首先,它需要下载模型,让我解释一下为什么这么酷。你可以看到它正在生成一些代码,这些代码实际上是针对图像的。这个图像是之前生成的,所以Python解释器在每次交互之间都会保持,它会写一些代码,然后你可以说“哦,其实你可以做一些其他的事情”,它仍然可以与那些代码交互,它仍然可以看到那些代码,并根据已完成的内容编写更多的代码,这不是其他库和工具默认情况下所做的,所以这是我喜欢的一件非常酷的事情。现在我们得到了一只大象,这是图像转换模型的作品。

I haven't used an image transform model before I didn't actually know that I were a thing but I think what it does is identifies where in the image the draft is which it's done and then just try to modify that part of the image so we get this kind of weird I mean yeah I can see what it's trying to do but it's interesting right so okay cool and then this this didn't work for me before I want to try again so could you give the elephant shiny laser eyes last time I tried this it made the elephant like made of gold let's see what it does this time maybe it just read could you make the elephant shiny I'm not sure okay so it went that again so now we have like a I don't know what it is it's like a gold draft I think and then okay we can caption the image so I'm very curious as to what it says about this image okay this caption is a gift gift there signing on a skateboard before I'm pretty sure it gave me a very similar very similar output so I wonder if is it okay
我以前没有使用过图像转换模型,实际上我并不知道它是什么,但我认为它的作用是识别原始图像中的草图位置,然后尝试修改图像的这部分,因此我们得到了有点奇怪的图像。我能看出它在试图做什么,很有趣。好的,那么这次我想再试一下,你能让大象拥有闪亮的激光眼睛吗?上次我试过,让大象变成了黄金的,让我们看看这次会发生什么,也许它只是读错了,你能让大象闪闪发光吗?我不确定,好的,它又开始了,现在我们有了一个...我不知道它是什么,看起来像是黄金的草图。好的,我们可以给图像加上说明,看看它会说些什么。好奇心真的被勾起了,看看它对这张图的描述是什么。好吧,说明是“有人在滑板前签字” ,我相当确定它给出了非常类似的输出,所以我很好奇这是否正常。

So it's a modified image right so can you caption the modified image or what I'm going to say is I'm going to copy this and I'm going to say sorry I meant the modified image okay okay a G half okay so the code is right so the caption is image caption a modified image and then this is weird I'm not sure why like maybe there's some weird stuff going on the tokenizer here but yeah here we get a GI gift on a skateboard okay fair enough and then I wanted to test this a little more can you search the internet some more of these types of images so search tool is a pretty typical tool that is included with the agents and I just wanted to see if they include that by default so let's try and we'll see okay so unfortunately no they don't seem to so it refers to a text download the tool okay and that is apparently a thing so download the text downloader model or tool I'm not sure what it is exactly and yeah it just downloads some text so doesn't work for everything yet but that I think is is already pretty cool the fact that we're just referring to all these models that we have this like Python in terms you're just built in and just say it's so easy to use I think is is really interesting and yeah for sure we're definitely gonna do a lot more on transformer agents in the future but for now yeah I just want to introduce the the library to you or the the new features to you I'm also just exploring myself again like I said there's a massive community aspect to this so that is probably one of the biggest things that I think Huguim face agents has going for it the fact that they will have and I haven't seen if there are if we can actually find them on the Huguim face website but let me show you what it looks like with just models so we can go over here we have models right and there's just tons of models on Huguim face right now imagine that they're planning to do or are doing the same thing with tools and it's not here yet I don't see any tools but clearly the the code or the interfaces are already there because we were downloading tools here I believe we're downloading tools here so that is super interesting and yeah I'm sure people are gonna build some insane tools very quickly so yeah that will be pretty huge in my opinion
这是一张修改过的图片,那么你能为修改后的图片加个标题吗?或者我要说的是,我会复制这张图片,然后我会说抱歉,我的意思是修改后的图片,好的,一个 G half(译者注:不知道具体指什么);所以代码是正确的,那么标题就是修改后的图片,但这很奇怪,我不确定为什么,可能是识别器有点问题,但没关系,我们得到了一张滑板车GI gif,可以说说条理还可以;我想再试一下,你能不能再搜索一些这类型的图片,因为“搜索工具”是智能代理中一个比较典型的工具,我想看看它默认是否包括在内,所以我们试一下看看;不幸的是,似乎不包括在内,它提到了下载该工具的文本,请注意下载文本下载器模型或工具,不确定具体是哪个,但是它只是下载了一些文本,可能并不适用于所有情况,但我认为这已经相当酷了,我们只需要提到我们拥有所有这些模型,这就像是Python中的术语一样,很容易使用,我认为这非常有趣,肯定会在未来的转换器代理中进行更多的工作,但现在我想向大家介绍这个库或者这个新功能,我也正在自己探索,正如我所说的,这里存在着巨大的社区因素,这可能是火山脸代理的最大优势之一,他们会有(官方工具),我还没有看到它们是否在Huguim face网站上,但我可以向你展示只有模型的样子,我们可以进入模型页面,现在Huguim face上有大量的模型,想象一下他们正在计划做或正在做与工具相同的事情,虽然它还没有出现在这里,但我相信这个接口已经做好了,因为我们可以在这里下载工具,这非常有趣,我相信人们很快就会构建一些令人难以置信的工具,这将是非常重要的。

Now I haven't seen of customizable these agents so yet that's something exploring very soon but I would imagine you know Huguim face to do make things pretty simple so my expectation is that it will be pretty easy to work through and figure all of that out so yeah over on I'm very excited to see what they what they do with this I think this will be a really cool feature but for now I'm gonna leave it there so I hope this has been interesting and useful thank you very much for watching and I will see you again in the next one bye
目前我还没有见过这些代理的可定制性,但我很快就会探索这个问题。我想,有雨熙将会让事情变得十分简单,所以我的期望是,使用起来将会非常容易,并且能够理清所有的事情。总之,我非常兴奋地想知道他们会用这个功能做些什么,我觉得这将会是一个非常酷的功能。但现在我要在这里结束了,希望这让你感到有趣和有用,非常感谢你的观看,我们下次再见!



function setTranscriptHeight() { const transcriptDiv = document.querySelector('.transcript'); const rect = transcriptDiv.getBoundingClientRect(); const tranHeight = window.innerHeight - rect.top - 10; transcriptDiv.style.height = tranHeight + 'px'; if (false) { console.log('window.innerHeight', window.innerHeight); console.log('rect.top', rect.top); console.log('tranHeight', tranHeight); console.log('.transcript', document.querySelector('.transcript').getBoundingClientRect()) //console.log('.video', document.querySelector('.video').getBoundingClientRect()) console.log('.container', document.querySelector('.container').getBoundingClientRect()) } if (isMobileDevice()) { const videoDiv = document.querySelector('.video'); const videoRect = videoDiv.getBoundingClientRect(); videoDiv.style.position = 'fixed'; transcriptDiv.style.paddingTop = videoRect.bottom+'px'; } const videoDiv = document.querySelector('.video'); videoDiv.style.height = parseInt(videoDiv.getBoundingClientRect().width*390/640)+'px'; console.log('videoDiv', videoDiv.getBoundingClientRect()); console.log('videoDiv.style.height', videoDiv.style.height); } window.onload = function() { setTranscriptHeight(); }; if (!isMobileDevice()){ window.addEventListener('resize', setTranscriptHeight); }