Hooking face have just announced something that I think is probably going to be a very major thing in the future of large lounge models and NLP. And that is their spin on agents for large lounge models and transformers in general.
Now there's quite a few reasons as to why I think hooking face are in a very good position to offer possibly one of the best agents and tool frameworks out there. And I'm going to discuss so first for those of you that haven't heard of these things before, I just kind of not sure what they are.
Let me quickly explain why an agent and what a tool is. So we know what large lounge models are. They are big transformer models that can basically answer questions in natural language for us based on some natural language input.
A agent kind of takes this and takes it a little bit further and expands these LMs out to basically allow them to have multiple steps of reasoning and thoughts so they can think to themselves. And this is ideal for when we want to integrate what are called tools.
So what we can do is we can tell an LLM, hey, we want you to answer a question. If you can't answer by yourself, you can actually refer to some other tools that we have given to you. And you might say something like if you don't know about a particular topic, you can form a Google search in order to find out about that particular topic.
And you would also explain how we can do that. And because the LLM has this multi-set thought process, it can say, okay, I've got this question. I need to use the Google search tool. And then it will say, how do I use that Google search tool? It's going to provide some input to Google search. We would then go do a Google search for it, return some answers and pass that back into the LLM. And now all of a sudden it can answer a question. And you can do this for a ton of tools, the SQL databases, knowledge base, effective databases and so on. Python interpreters, you can do basically anything you can program. You can create a tool out of it.
Now obviously this is a very powerful thing to be able to do. Now at the moment, by far the biggest library for using agents is Lanchane. There's also HaySack who have introduced agents recently. And there's also actually ChatGPT plugins, which are a form of agents as well, or at least it's a form of tools added to the agents, which is actually ChatGPT itself.
Now HuggingFace have also jumped on the bandwagon and I think their implementation is actually very interesting and particularly powerful for a lot of different reasons. A big component of that is what HuggingFace actually is. So HuggingFace is essentially almost like a huge community and a hub of all of these different transformer models, diffusion models for generating images, data sets and just a ton of anything can think of in machine learning, HuggingFace actually cover a lot of it.
And their version of agents and tools are very interesting and I haven't been all the way through yet. This is kind of my first look. But the agent itself is very simple to use. It also can be used as a conversation agent. So as a chatbot where you have more ticeps in the process. And it also gives us access to all of these models on HuggingFace which I think is one of the coolest things about it.
But there is other things that actually let me jump into it and show you those rather than just talking through them. We'll just have a quick look at their example here. So they have basically they don't really show you anything. I'm going to go through a code in a moment. But you run agent that we're on caption of following image. And then they pass in this image through image here. And the output is a beaver's room in the water.
Then they also have this. So agent run read the following text out loud. And it will use a text speech model to actually do that. Can also do this. So we have like some a we see our reading this document or this image of a document. And then we say in this following document. As we ask the question. and the output is this forum for you which is down here at the bottom bottom right now.
So the point here is that these agents are using like a ton of different models from the Transformers library. And I think this takes pretty clear inspiration from this paper called hugging GPT which essentially use chat GPT with a ton of hugging face models to do a load of cool things. So it uses same sort of approach where you have an agent which was the chat GPT model.
And if we come down to the first image here this basically shows us how it works. We have a large language model chat GPT in this case our or GPT 3.5 turbo ice zoom. And this is like your controller. And then we have all these what we would call specialist models that can do particularly things that chat GPT is not able to do like understand what is within an image or caption and image and chat GPT or your large language model is able to basically figure out okay given a question which models you need to use what input do I need to pass them and then uses the output from those models to inform the next step of trying to figure out what it needs to do. And then provide a very cool answer that a normal large language model would not be able to do by itself.
Now let's take a look at a care example of how we can use this. Now initially I'm just actually using one of the examples from hugging face and then we'll just go through a few cells and then we'll do something a little more interesting. So first thing we need to do here is install a few things. So transformers which is a library that contains the agents because we're going to be using image generation and models here so the diffusion models we need to use on face diffuses and we're also going to use it to accelerate which I believe allows us to run things faster.
现在让我们来看一个使用这种方法的实际例子。一开始,我使用了 Hugging Face 中的一个示例,然后我们会通过几个单元格进行实验,并做一些更有趣的事情。首先,我们需要安装一些东西,包括 Transformers 库,这个库包含了我们要使用的待处理代理,因为我们将要使用图像生成和模型技术,所以我们需要使用扩散模型和加速技术,我相信这将使我们运行速度更快。
Now in reality and I'm not sure if this is the case I think we should hopefully be running on a GPU here okay so here I've gone into my runtime settings and just changed my hardware accelerated to GPU. And then what we do is we're just going to use OpenAI here obviously for your large language model hugging face makes it very easy to use other open source options so you can do that I'm just using this because I know it's going to work it's quick so here we are.
And I'm going to use text eventually 0-0-3 basically what I've found generally speaking is that text eventually 0-3 is usually better at following instructions for tools within agents than GPT 3.5 Turbo and that's also the model that they use that hugging face using in their examples. So I'm not 100% sure if they support GPT 3.5 Turbo yeah I'll need to try at some point.
So yeah all we do transformers tools import open AI agent and there are other agents as well I think there's the hugging face agent is like HF agent maybe but obviously we're just going to use OpenAI on here. You will need to add in your OpenAI API key which you can get from Splatform platform. OpenAI.com.
所以我们所做的是转换工具导入 OpenAI 代理,还有其他代理,我认为还有 hugging face 代理,可能就像 HF 代理一样,但显然我们只会在这里使用 OpenAI。您需要添加您的 OpenAI API 密钥,您可以从 Splatform 平台上获得。 OpenAI.com。
意思是说,他们使用了转换工具,导入了 OpenAI 代理,同时也有其他代理可供选择,但他们只想使用 OpenAI。他们提到为了使用 OpenAI,需要添加 API 密钥,可以从 Splatform 平台上获得。
Okay so we're going to run this this is going to initialize our agent and actually that's all we need to do. No works which is very easy and quick sell.
好的,我们将运行此程序,这将初始化代理,事实上这就是我们需要做的全部。没有问题,这非常简单和快速。
So I think all we're doing here is okay so we're downloading the tool configuration and this is a very interesting component of hugging faces tools implementation which is that we can download community contribute tools and obviously hugging faces own tools. So in the next probably very soon next few weeks we're probably going to see some pretty insane tools appear from the community which will be fascinating to see. That's one of the big components as to why I think this is going to be pretty major.
Okay and another reason I think this is going to be pretty major is that we can do more time and more adult agents super easily. But I haven't done anything here. I just initialized my agent and I said okay I want to generate an image of a boat in the water. Right and because hugging face has they have a big diffusers library which contains the loads of text to image diffusion models and they obviously have all the transform models as well.
They can integrate at least a few of those into the default agent. So if I just say generate an image of a boat in the water what is soon here is this isn't how long it takes the process. This is actually downloading the model. Okay so the image generation model. This will only happen once. Okay and I'll prove that by running it again in a moment. So that's going to download it is okay here we go so we've got an estimation from the agent. I'm going to use the following tool image generator to generate an image according to the prompt. And it generates some code here and here we go here is our image of a boat in the water.
Okay so yeah that was super easy to do and let me just run that again. Okay you'll see that it doesn't take quite as long this time. Okay so it's generating the image and there we go right that was eight seconds which may be considering it also three seconds to generate the image that's pretty good for an agent. So that is really cool.
And okay what we can also do is okay I have a boat image here and we can come down and do agent run again. Now I can pass in this variable okay so we use this back tick here and we can pass in a variable which we then enter its own actual variable here right so we could also just do like image okay and then that just means that we need to replace image here. Okay so let's run that.
I'm just going to ask you to write a caption for this image. Again it's going to need to download the captioning model as you can see here okay and then we get a boat floating in the water with clouds in the background right let's run it again so you can see how long that actually takes okay so four seconds again very quick.
Okay and then so here I was just looking okay what does that prompt template look like when we're doing that run method. You can see a little bit of the logic that is going on in here here so I'm going to ask you to form tasks it includes all the tools here and then it gives a few examples and then ask it to figure out what it needs to do next right this is just yeah we don't need that it just contains all the code for the agent which is another nice thing that I like about the hooking face implementation is that the code is pretty readable so if something isn't quite working the way you'd expect it to working going to code and kind of figure out why almost straight away which is not as easy to do with other libraries at the moment so that that's very nice and yeah I mean it's super cool
Now let's have a look at a conversational agent so basically a chat box right so I'm going to say hey how are you okay we just get this right either I'm doing well thank you for asking cool.
I'm going to ask it's great image of a draft writing a skateboard and I just made this up very quickly before running this and I mean the results are not perfect they're they're interesting they're funny right so we're not running any special diffusion models here so we'll get like this we had to had a draft but you know let's stick with that and like as a the resource or entertaining and then they're not particularly impressive from a image generation point of view but it's just interesting to see.
Here using a image generator model and then you come down here and it's not going to use a image generator model it's going to use an image transform model to modify the existing image and this is something that is really cool as well so okay first it needs to download that model so let me explain what is so cool here right so like you can see that it's generating some code right and this code is actually referring to the image okay and the image is generated by this code beforehand so that Python interpreter that all of this is using is maintained between chat interactions it's going to write some code and then you can say oh actually can you do something else and it's it can still interact with that code against it it can still see that code and it's going to write some more code based on what's already done which is not something that I have seen done by default in other libraries like use agents and tools so I mean that's just a really cool thing that I like and it's just insane how easy it is to get that working so cool yes and now we get this right so it's an elephant so this image transform model
I haven't used an image transform model before I didn't actually know that I were a thing but I think what it does is identifies where in the image the draft is which it's done and then just try to modify that part of the image so we get this kind of weird I mean yeah I can see what it's trying to do but it's interesting right so okay cool and then this this didn't work for me before I want to try again so could you give the elephant shiny laser eyes last time I tried this it made the elephant like made of gold let's see what it does this time maybe it just read could you make the elephant shiny I'm not sure okay so it went that again so now we have like a I don't know what it is it's like a gold draft I think and then okay we can caption the image so I'm very curious as to what it says about this image okay this caption is a gift gift there signing on a skateboard before I'm pretty sure it gave me a very similar very similar output so I wonder if is it okay
So it's a modified image right so can you caption the modified image or what I'm going to say is I'm going to copy this and I'm going to say sorry I meant the modified image okay okay a G half okay so the code is right so the caption is image caption a modified image and then this is weird I'm not sure why like maybe there's some weird stuff going on the tokenizer here but yeah here we get a GI gift on a skateboard okay fair enough and then I wanted to test this a little more can you search the internet some more of these types of images so search tool is a pretty typical tool that is included with the agents and I just wanted to see if they include that by default so let's try and we'll see okay so unfortunately no they don't seem to so it refers to a text download the tool okay and that is apparently a thing so download the text downloader model or tool I'm not sure what it is exactly and yeah it just downloads some text so doesn't work for everything yet but that I think is is already pretty cool the fact that we're just referring to all these models that we have this like Python in terms you're just built in and just say it's so easy to use I think is is really interesting and yeah for sure we're definitely gonna do a lot more on transformer agents in the future but for now yeah I just want to introduce the the library to you or the the new features to you I'm also just exploring myself again like I said there's a massive community aspect to this so that is probably one of the biggest things that I think Huguim face agents has going for it the fact that they will have and I haven't seen if there are if we can actually find them on the Huguim face website but let me show you what it looks like with just models so we can go over here we have models right and there's just tons of models on Huguim face right now imagine that they're planning to do or are doing the same thing with tools and it's not here yet I don't see any tools but clearly the the code or the interfaces are already there because we were downloading tools here I believe we're downloading tools here so that is super interesting and yeah I'm sure people are gonna build some insane tools very quickly so yeah that will be pretty huge in my opinion
这是一张修改过的图片,那么你能为修改后的图片加个标题吗?或者我要说的是,我会复制这张图片,然后我会说抱歉,我的意思是修改后的图片,好的,一个 G half(译者注:不知道具体指什么);所以代码是正确的,那么标题就是修改后的图片,但这很奇怪,我不确定为什么,可能是识别器有点问题,但没关系,我们得到了一张滑板车GI gif,可以说说条理还可以;我想再试一下,你能不能再搜索一些这类型的图片,因为“搜索工具”是智能代理中一个比较典型的工具,我想看看它默认是否包括在内,所以我们试一下看看;不幸的是,似乎不包括在内,它提到了下载该工具的文本,请注意下载文本下载器模型或工具,不确定具体是哪个,但是它只是下载了一些文本,可能并不适用于所有情况,但我认为这已经相当酷了,我们只需要提到我们拥有所有这些模型,这就像是Python中的术语一样,很容易使用,我认为这非常有趣,肯定会在未来的转换器代理中进行更多的工作,但现在我想向大家介绍这个库或者这个新功能,我也正在自己探索,正如我所说的,这里存在着巨大的社区因素,这可能是火山脸代理的最大优势之一,他们会有(官方工具),我还没有看到它们是否在Huguim face网站上,但我可以向你展示只有模型的样子,我们可以进入模型页面,现在Huguim face上有大量的模型,想象一下他们正在计划做或正在做与工具相同的事情,虽然它还没有出现在这里,但我相信这个接口已经做好了,因为我们可以在这里下载工具,这非常有趣,我相信人们很快就会构建一些令人难以置信的工具,这将是非常重要的。
Now I haven't seen of customizable these agents so yet that's something exploring very soon but I would imagine you know Huguim face to do make things pretty simple so my expectation is that it will be pretty easy to work through and figure all of that out so yeah over on I'm very excited to see what they what they do with this I think this will be a really cool feature but for now I'm gonna leave it there so I hope this has been interesting and useful thank you very much for watching and I will see you again in the next one bye