Building Dota Bots That Beat Pros - OpenAI's Greg Brockman, Szymon Sidor, and Sam Altman | Transcription

Transcription for the video titled "Building Dota Bots That Beat Pros - OpenAI's Greg Brockman, Szymon Sidor, and Sam Altman".


Note: This transcription is split and grouped by topics and subtopics. You can navigate through the Table of Contents on the left. It's interactive. All paragraphs are timed to the original video. Click on the time (e.g., 01:53) to jump to the specific portion of the video.


Intro (00:00)

Now, if you look forward to what's going to happen over upcoming years, is the hardware for these applications for running neural nets really, really quickly are going to get fast, faster than people expect. And I think that what that's going to unlock is you're going to be able to scale these models and you're going to see qualitatively different behaviors from what you've seen so far. At OpenAI, we see this sometimes. For example, we had a paper on this unsupervised learning where you train a model to predict the next character in Amazon reviews. Just by learning to predict the next character in Amazon reviews, somehow it learned a state-of-the-art sentiment analysis classifier. And so it's kind of crazy if you think about it, right? You just were told, "Hey, predict the next character." If you were told to do this, well, the first thing you do is you'd learn the spelling of words and you learn punctuation. The next thing you do, you start to learn semantics, right? If you have extra capacity there and that this effect goes away if you use a slightly smaller model. And what happens if you have a slightly larger model? Well, we don't know because we can't run those models yet. But in upcoming years, we'll be able to. What do you guys think are the most promising under-explored areas in AI? If we're trying to make it come faster, what should people be working on that they're not? Yeah, so there are many areas of AI that we already developed quite a bit. There's some basic research in just classification, deep learning, and reinforcement learning. And what people do is people kind of try to invent problems such as solving some complicated games of character structure. And they try to add kind of extra features to their models to combat those problems. But I think there's very little research happening on actually understanding the existing methods and their limits. So for example, it was a long-head belief in deep learning that kind of to parallelize your computation, you need to cram as small batches as possible on every device. And in fact, Baidu did this impressive engineering feat where they took recurring neural networks and they implemented the kind of GPU assembly code to make sure that you can feed like batch size one RNNs on every GPU. And despite all those smart people working on those problems, only very recently that Facebook kind of took a cold, quiet look at just like very basic problem of classification. And they in their great paper called ImageNet in One Hour, they showed that if you actually take a code that does image classification, and if you fix all the bugs, you actually can get away with much larger batch size and therefore finish the classification problem much faster. And it's not the kind of sexy research that people want to see where you have like some hierarchy of RNNs. But actually, this kind of research, I think, at this point will advance field the most. Greg, you mentioned hardware and your initial answer. In the near term, what are the actual innovations that you foresee happening? The big change is that the kinds of computers that we've been trying to make really fast are general purpose computers that are built on the von Neumann architecture. You basically have a processor, you have a big memory, and you have some bottleneck between the two. With the applications that we're starting to do now, suddenly you can start making use of massively parallel compute. That the architectures that these models can run on the fastest are going to look kind of like the brain, where the brain is basically you have a bunch of neurons that all have their own memory right near to them and that they all talk to their neighbors and maybe there's some kind of longer range skip connections.

Discussion On Ai Potential And Technical Issues

What do you guys think are the most promising under-explored areas in AI? (03:47)

And that just no one's really had incentive to develop hardware like this. And so what we've seen is that, well, you move your neural networks from running on a CPU to a GPU and now suddenly you have a thousand CUDA cores running in parallel and that you can get massive performance boost there. Now, if you move to specialized hardware that is much more brain-like and that runs in parallel with a bunch of tiny little cores, you're going to be able to run these models insanely faster. Okay. So I think one of the most common questions or threads of question that were asked on Twitter and Facebook were generally how to get into AI. Could you guys give us just a primer of where someone should start if they're just a CS major in college? Yeah, absolutely. So it really depends on the nature of the project that you would like to do. I can tell you a bit about our project, which is essentially developing large-scale reinforcement learning for Dota 2. And the majority of the work is actually engineering. And essentially taking the algorithms that we have already implemented and trying to scale them up is usually the fastest way to get improvement in our experiments. So essentially becoming a good engineer for our team is much more valuable than, for example, people spending months after months implementing exotic models in TensorFlow. So just to echo this, because I hear this come up all the time, people say it's like my dream to work at OpenAI, but I got to go get an AI PhD, so I'll see you in like five or seven years. If people are just a really solid engineer but no experience at all with AI, how long does it take someone like that to become productive for the kind of work at OpenAI that we're looking for? So someone like that can actually become productive from day one. And with different engineers who show up at OpenAI, there's a spectrum of where they end up specializing. There are some people who focus on building out infrastructure and that actually this infrastructure can range from, well, we have a big Kubernetes deployment that we run on top of a cloud platform and building tooling and monitoring and sort of managing this underlying layer. And actually looks quite a bit like running a startup and that a lot of the people who are most successful at that have quite a bit of running that large scale in a startup or production environment. There's kind of a next level of getting closer to the actual machine learning, where if you think of how machine learning systems look, that they tend to be this like magical black box of machine learning that this core and you actually try to make that core be as small as possible because machine learning is really hard, it's a lot of compute, it's really hard to tell what's going on there and so you want it to be as simple as possible. But then you surround it by as much engineering as you possibly can. So what percent of the work on the Dota 2 project would you guys say was what people would really think of as like machine learning science versus engineering? Essentially, as far as day-to-day work goes, this kind of work was almost non-existent. There was like maybe a few person weeks spent on that compared to like person-to-person engineering.

Good bets (07:23)

I think maybe placing some good bets was one of it. Good bets on the machine learning side? On the machine learning side, yeah. And they're often more about what not to do rather than what to do. So at the very beginning of the project, we knew we wanted to solve a game, a hard game. We didn't know exactly which one we wanted to do because these are great test beds for pushing the limits of our algorithms and one of the great things about it too. And just to be clear, you guys are two of the key people. The entire team was like 10 people. About 10 people. And these things, good test beds for our algorithms, see what the limits are to really push the limit of what's possible. And you know for sure that when you've done it, that you've done it. It's very binary testable. And so actually the way that we selected the game was we went on Twitch and just looked down the list of most popular games in the world and starting number one is League of Legends. The thing about League of Legends is it doesn't run on Linux and it doesn't have a game API. And little things like that actually are the biggest barrier to making AI progress in a lot of ways. And so looking down the list, Dota actually was the first one that kind of had all the right properties. Runs on Linux, that it has a big community around replay parsing, that there's a built-in Lua API.

What is the large-scale reconstruct learning and how it works (08:46)

It was actually this API was meant for script or for building mods rather than for building bots. And we were like, but we could probably use it to build bots. And one of the great things about Valve as a company is that they're very into having these open hackable games where people can go and do a bunch of custom things. And so kind of philosophically, it was very much the right kind of company to be working with. So we actually did this initial selection back in November and we're working on some other projects at the time and so didn't really get started until late in December.

Scripted bot API in C++ & Docker (09:21)

And one of the funny things is that by total coincidence, in mid-December, Valve released a new bot-focused API and that they were saying, "Hey, our bots are famously bad that maybe the community can solve this problem, so we'll actually build an API specific for it for people to do this." And that was just one of those coincidences of the universe that just worked out extremely well. So we were kind of in close contact with the developer of this API and kind of all throughout. So at the very beginning of the project, well, what are you going to do? So the first thing was we had to become very familiar with this game API to make sure we understood all the little semantics and all of the different corner cases and also to make sure that we could run this thing on large scale and to turn it into a pleasant development environment. And so at the time, it was just two of us. One person was working with the bot API building a scripted bot. And so basically, this is the learn all the game rules, think really hard about how it works. This particular person who wrote it, Roffel, has played about three or four games of Dota in his life, but he's watched over a thousand hours of Dota gameplay and has now written the best Dota scripted bot in the world. And so that sort of a lot of just writing this thing in Lua, getting very intimately familiar with all those details. In the meanwhile, what I was working on was trying to figure out how do you turn this thing into a Docker container? And so they had this whole build process. Turns out that Steam can only be an offline mode for two weeks at a time that they push new patches all the time. So you needed to go from this sort of manually download the game and whatever to actually have an automated repeatable process. It turns out that the full game files are about 17 gigabytes and that our Docker registry can only support five gigabyte layers. And so I had to write a thing to chunk up things into five gigabyte tarballs and put those in S3 and slip them back down. So a bunch of sort of like things there was really just about figuring out what the right workflow is, what the right abstractions are. And then the next step was, well, we know we want to be writing our bots in TensorFlow and Python. How do you get that? Why was that? Well, because so machine learning, you know, that it's actually quite interesting that a lot of the highest order bit on progress, just like having the game API is a high order bit, it's also can you use tools that are familiar and sort of easy to iterate with before the world of kind of modern machine learning frameworks ever write their code in Matlab. If you had a new idea, it would take you two months to do it. Like good luck making progress. And so it was really all about iteration speed. And so if you can get into the Python world, well, we have these large code bases that we built up of high quality algorithms that there's just so much tooling built around it that that's like the optimal experience. And so the next step was to port the scripted bot into Python. And so the way I did that was I literally just renamed all of the dot Lua files to dot pi commented out the code, and then started uncommenting function by function. And then, you know, you run the function, you get an exception, you then go and uncomment whatever code it depends on. And as mechanically as possible, I tried to be like a human transpiler. And, you know, that Lua is one index Python zero index, you have to do that that Lua has doesn't distinguish between an array type and an a dictionary type and so you kind of have to disambiguate those two. But for the most part, it does something that could have been like sort of totally mechanically done. And it's great because I didn't have to understand any of the game logic, I didn't have to understand anything that was going on under the hood, I could basically just port over and and it just kind of it just kind of came together. But then you end up with a small set of functions that you do not have implementations of, which are all the actual API calls. And so I ended up with a file with a bunch of dummy calls. And I knew exactly which calls I needed and then implemented on top of gRPC, a protobuf base protocol where on every tick, the game would dump the full game state, send the diff over the wire, reassemble that into an in memory state object Python, and then all of these API methods would be implemented in Python. And so at the end of this, you know, sounds like a bit of a Frankenstein process, but it actually worked really well.

Porting Python to Lua (13:22)

And in the end, we had something that looked just like a typical open AI gym environment. And so all you have to do is you say, gym dot make this, you know, dota environment ID. And suddenly, you're playing dota, and your Python code just has to call into some some, you know, object that implements the the glue API. And suddenly, these characters are running around the screen doing what you want. And so this was like a lot of the kind of thing that that that I was working on in the pure engineering side. And actually, you know, kind of as I went on, so Shimon and yak up and, and Jay and others joined the project. And most people were building on top of this API and really didn't have to dig into any of the underlying implementation details. You know, so personally, my one machine learning contribution to the project, I'll tell you about that. Because, you know, my background is primarily startup engineering building building large infrastructure, not sort of machine learning, definitely not a machine learning PhD, I didn't even finish college. So I kind of reached a point where I got in the infrastructure into a pretty stable point that I felt like, all right, like I don't have to be fighting the fires there very constantly, I have some time to actually focus on digging some of the machine learning. One particular piece that we were interested in doing was behavioral cloning. So we had one of one of the systems that we had built was to go and download all of the replays that that are published each day. And so the way this game works is that there's about 1.5 million replays that are available for public download. Valve clears them out after two weeks, and so you have to have some discovery process, you have to stick them in S3 somewhere. Originally, we were downloading all of them every day, and realized that was about two terabytes worth of data a day. That adds up quite quickly. So we ended up filtering down to the most expert players, but we wanted to actually take this data, parse it and use it to clone the behavior for a bot. And so I spent a lot of time with like sort of, you know, it's basically this you have to need this whole pipeline to download the replays to parse them to, you know, kind of iterate on that, then take it train a model and try to try to predict what the behavior would be. And, you know, first, it's just like, like, one thing I find very interesting is the sort of different workflow that you end up with when doing machine learning, like there are a bunch of things where when software engineers join open AI, that are just very surprising. For example, if you look at a typical research workflow, you'll see a lot of files named like, you know, experiment, whatever the name of the experiment is, one, two, three, four, and you look at them and they're just like slight forks of the same thing. And you're like, isn't this what version controls for? Why are you doing that? And after doing this cloning project, I learned exactly why.

Deciding who does which tasks (16:06)

Because the thing is, if you have a new idea for, okay, well, I've kind of got this thing working, and now I'm going to try something slightly different. As you're doing the new thing, well, machine learning is, to some extent, very binary. At the start, it just like doesn't work at all. And you don't know why, or it kind of works, but it has some weird performance and you're not sure exactly, is it a bug? Is it just how this data set works? Like, you just don't know. And so, if you've gotten it working at all, and then you make a change, you're always going to want to go back and kind of compare to the previous thing you had running. And so, you actually do want the new thing running side by side with the old thing. And if you're constantly stashing and unstashing and checking out and whatever, then you're just going to be sad. And there are a lot of like kind of workflow issues like that. Then you just, you got to bang your head against the wall. And then you see like, I've been enlightened. So, but before we progress further on the story, can you just explain the basics of training a bot in a game? Like, how are you actually giving it the feedback? Oh, yeah. So, on a high level, we are using reinforcement learning with self-play. So, what that means, I mean, it's no rocket science, even though like reinforcement learning sounds so fancy. Like, essentially, what's happening is we have a bot, which observes some state in the environment and perform some actions based on that state. And based on those actions that it executes, then it continues playing and eventually, you know, either does well or poorly. So, that can, that's something that we can quantify in a number. And that's more of an engineering problem than research problem, how to quantify how good the body is doing, right? You need to come up with some metric. And then, you know, the bot gets feedback or whether it's doing good or not, and then tries to select the actions that yield to high, that positive feedback to high reward. And to give us a sense for how well that works. So, the bot plays against itself to get better. Once you had everything working, how good would a bot from day and do against a bot from day and minus one?

PANicking and Stepdown Approaches (18:05)

So, I guess we have a story that kind of illustrates what to expect from those techniques. So, when you started this project, our goal wasn't to really do research. I mean, at some high level, it was, but we are very goal oriented. All we wanted to do is we wanted to solve problem, right? We wanted to solve .fiv5, and we had our myslope.iv1. And the way it started, it was like early days when there was just Greg and Rafa, and Rafa was implementing a scripted bot. So, he just like, "Ricai, write the logic. I think this is what bots should do." When he sees a creep, he needs to attack it, yada yada. And he spent like three months of his time, and Rafa was actually a really good engineer. So, we had a really good scripted bot. So, what happened then? You know, we kind of like, he got to the point where he didn't think he could improve it much more. So, we tried, "Okay, let's try some reinforcement learning." And, you know, I was actually at vacation at the time, but there was other engineer, Jakub, who, throughout my vacation, which I found super surprising, said, "I leave. There is nothing. I come back. There is this reinforcement learning bot." And actually, it's beating our scripted bot after like a week worth of engineering effort. Possibly those two weeks, but it was something very miniature compared to the development of scripted bots. So, actually, our bot, which didn't have any assumptions baked about the game, figured out the underlying game structure well enough to beat anything that we could call it by hand, which was pretty amazing to see. And at what point do you decide to compete in the tournament? Well, so, maybe I should finish up my story. Sorry if it's running a bit long, but it'll get good shortly. So, just finish up my machine learning contribution. So, I basically spent about a month really learning the workflow, got something that like, you know, was able to do some signs of life where it like run to the middle and you're like, "Oh, it knows what it's doing. It's so good." And it's very clear, like when you're just doing cloning that like these algorithms like learn to imitate what it sees rather than the actual intent. And so, it'd get kind of confusing and like kind of run, you know, try to do some like creep blocking or something, but like the creeps wouldn't be around. And so, it just be like zigzagging back and forth. And anyway, I got this to the point where it was actually creep blocking pretty reliably pretty well. And then at that point, I turned it over to Jay, who's also working on the project and he used reinforcement learning to fine tune that. And so, suddenly it went from only understanding the actions rather than the intent to suddenly it really knew what it was doing and kind of has the best creep block that anyone has seen. And so, that was my one machine learning contribution to the project. So, time went on and one of the most important parts of the project was having a scoreboard.

The True Skill Metric and Its Signaling Value (20:45)

So, we had a metric on the wall, which was the true skill of our best bot. So, true skill is basically an evil like rating that measures the win rate of your bot versus versus others. And you put that on the wall and each week people just try all the ideas and some of them work, some of them improve the performance and that we actually ended up with this very smooth, almost linear curve. So, we posted in a blog post and that that really means kind of like exponential increase in the strength of this bot over time. And that part of that is sometimes these data points where you just train the same experiment for longer. I mean, typically our experiments would last for maybe up to two weeks. But also, a lot of those were, well, we had a new idea. We tried something else. We made this tweak. We added this feature, removed this other component that wasn't necessary. And so, we chose the goal of 1v1. I don't recall exactly when, but it must have been in the spring or maybe even early summer. But we really didn't know, are we actually going to be able to make it? And unlike normally when you're building an engineering system, you think really hard about all the components. It's like, well, you decompose it into this subsystem, that subsystem, that subsystem, and you can measure your progress as what percent of the components are built. Here, you really have ideas that you need to try out and that it's unpredictable in some sense. And actually, one of the most important changes to the project in turn to making progress was initially the way that the project management was happening was that each week, well, so we had written down our milestones of let's beat this person by this date, let's beat this other person by this date, let's be able to do these outcome based milestones on a weekly or biweekly basis. Those things would come and go and you wouldn't have them. And then what are you supposed to do? It's completely unactionable. It's not like there was anything else you could have done. It's just you have more ideas you need to try. And instead of shifting it to what are all the things we're going to try by next week? That's a good insight.

Attempting to Pothole-Bot (22:42)

And then you do that. And then yeah, if you didn't actually do everything you said you were going to do, then you should feel bad and then you should do more of it. And if you did all those and it didn't work, then fair enough, but you achieved what you wanted to achieve. And so even going into the international, so two weeks before the international was kind of our cutoff for at this point, there's not much more we can do that. We're going to do our biggest experiment ever, put all of our compute into one basket and see where it goes. And at that point, like at two weeks out, how good was the bot? Oh, it was barely sometimes winning with professionals that we had testing them, but not even always. No, no, no, it sometimes happened. So, so yeah, so to be specific, I'm just pulling this back in. So July 8th is when we had our first win against our semi-protester. And then a sequence of losses, a sequence of losses. And then we were kind of more consistent with it. And then he went on vacation. And so he was on some laptop somewhere that was not very good. And then we were consistently beating him, but that was not very reliable data. This was the week before the international. And so we didn't really know how good we were getting. We knew that this true skill was going up. When was the last time that like an open AI employee beat the bot? How far out was that from? I think like a month or two before the AI, although we're not very good at Dota. Okay. But so like a month or two out, it could beat all the open AI people two weeks out. It could at one time beat a semi-pro. I'd say it's a four weeks was the first, the first time that it beat the semi-pro. Okay. And then, you know, two weeks out, we don't know. We still can't really find out. I mean, I guess we could rerun that bot, but you know, we really didn't know how good it was at that time. We just knew, Hey, we're able to beat our semi-pro occasionally. And we going into, into the international figure that, Hey, there's a 50/50 shot. And I think we were telling Sam the whole way, like, you know, the probability, with these things, you never really trust the probabilities. You just trust the trend of the probabilities. Even that was just swinging wildly. You guys would text me every night. It would be, Oh, we're going to, you know, no chance. We're not going out. We're definitely going to win every game. Yep. Yep. And so it was very clear that our own estimates of what was going to happen were, were, were miss calibrated. And, and throughout the, the, the week of TI, actually, we, we still didn't know. And what was happening is we, you guys all went to Seattle for this week.

Bot teaming. AI meditates the red hot match (25:12)

Most of the team went there. Okay. Yes. So you're like hold up in a hotel or a conference center or something in Seattle. Well, actually the reality of it was that we were holed up like near, near the stadium. Okay. Let me describe how we were holed up. So we were given a locker room in the basement of Key Arena. So we all had production badges. And so you feel very special as you walk in, you're like, Oh yeah, you know, like I just, I just get to, you know, kind of go skip the line and, and go to the backstage area. But it was literally a locker room. They converted into a filming area and we all had our laptops in there and that they would also bring in pro players every so often. We had a whole filming set up and, and then we play against the pros. And we had, we had a partition that we set up, which was just like a, like a black cloth, basically between like the whole team sitting there being like, are we going to be able to beat this pro maybe and trying to be as quiet as possible. And these, you know, these, these pros who were playing and, you know, on Monday they brought three, or I think two pros and like one very high ranked analyst by, and we had our first game and, you know, we really didn't know what was going to happen and we beat this person three, three zero. And you know, this was actually a very exciting thing for everyone at opening AI where it's about the time when I was, I was kind of live, live slacking the updates as the game is like this person, you know, just said this and like, I'm over that and like, you know, now it's this many last tips. And yeah, were you winning by a large margin? So yeah. Do you remember the details of that one? Which games? This is Blitz. Oh Blitz. I think we won every game. Yeah, we did three, three zero. I don't know exactly what the margin was. And we have all the data, but valve brought in the second pro, this, this professional named Pycat and he played the bot and we beat him once, we beat him twice and then he beat us. Oh, okay. And that we, looking at the game, we knew exactly what had happened. Yeah, essentially what happened is he accumulated a bunch of one charges, right? There's this item that accumulates charges and he accumulated more charges than our bot has ever seen in game. Because just our bots don't, it turns out that like there was a small, I think it's safe to save a kind of a bug in our setup. Okay. So it's basically it passed some threshold that your bot was not ready for. Well, I'd say very, very specifically that kind of the root cause here was that he had gone for an item in early wand build and we had just never done item early wand build. And so it's just like our bot had just never seen this particular item build before. And so it had never had a chance to really explore what does it mean? Like, and so it had never learned to save up stick charges and to use them and whatever. And so that it would do is very good at calculating like who's going to win a fight. Wild. But because, and I kind of recognized that he was like, I wonder what happens if I push on this axis and sure enough, it was an axis the bot hadn't seen. And so then we played a third match against another pro, went three zero on that. And it's actually very interesting getting the pros reactions. Cause we also didn't really know, are they kind of fun? Like it's going to be cool. Are they going to hate it? And we got a mix of reactions. You know, some of the pros were like, this is the coolest thing ever. I want to learn more about it. One of the pros was like, this thing's stupid. I would never use it. But apparently after the pros left that night, they spent four hours just like talking about the bot and kind of what, what, what, what it meant. Yeah. And the players were like highly emotional in their reactions to the bot. They were never beaten by the computer. So, so, so this kind of unbelievable. So for example, one of the players that actually managed to eventually beat the bot, he was like, okay, this bot is effing useless. Like I never want to see it. And then he kind of called down and like after like five or 10 minutes, he's like, okay, this is actually great. This is going to improve my practice a lot. And so after your bot, it lost that first time. Did they start talking about counterintuitive strategies to beat it? Well, so, so I think at that point that, you know, I think that, well, actually, I don't know. Maybe you can, you can answer that. Yeah. So I don't think pro players are interested in that. The pro players are mostly interested in the aspect where they, it lets them get better at the game, which, which means that, but, but there was a point after the event where we set up this big LAN party where we had like 50 computers running the bot, we kind of unleashed this swarm of humans to kind of add our bot and they find, found all the exploits and we kind of expected them to, to be there because the bot can only learn as well as the environment in which it plays allows it to. Right. So there are some things that you just never seen. And of course, those, those, those ones will be exploitable. And we are kind of excited about our, you know, our next step, which is 5v5 because 5v5 is one giant exploit. Like essentially it's, it's about like exploiting the other team, like being where they don't expect you to be kind of like doing other distribution things. So, so, so naturally we know we will have to solve those problems head on for 5v5. Right. So one, one thing I think was pretty interesting about the training process is that a lot of our job while we were doing this was seeing what the exploits were and then making a small tweak that fixes them. And like the way that, that I now think about machine learning systems is that they're, they're really a way to make the leverage of human programmers go way up. Okay. Right. Cause again, normally like when you're building a system, you build component one, component two, component three, and that kind of your marginal return on like building component four is like, you know, similar to your marginal return on component one. Whereas here, a lot of the early stuff that we did, it's just like your thing goes from being like crappy to like slightly less crappy. But once we were at the international and we had this lost to Pycat, we knew, okay, well, the root cause here is just, it's never seen this item build before. Well, all we had to do was make it tweak to add that to our list of item builds. And then it played out this scenario for the next, you know, however long. Can you walk me through actually how that tweak works on the technical side? Because my impression is kind of what you guys have been saying. It's just been in a million games. So it kind of has learned all this stuff. And some people talk about, you know, these networks as just very gray and they don't actually know how to manipulate what, how are you guys getting in there and changing things? Yes. So, so it's kind of funny. In some sense on a high level, you can compare this process to teaching a human, like, you know, like kind of you see a kid doing maths and it's kind of like confusing, like addition with subtraction, suppose, right. And you're like, kind of like, look here, this symbol, this is, this is, this is what you're not seeing clearly, right. And, and the same of those tweaks to the, to the, so, so clearly our bot has never seen this, won't build that, that Greg, Greg mentioned. And, you know, all we had to do is we had to say that, like, when you, when the bot plays games and chooses what items to purchase, we just need to add some probability of sampling that specific build that it has never seen. And when it plays a couple of games against opponents that use that build, when it uses this build a couple of times itself, then, then it kind of becomes more comfortable with the idea of what happens, what are the in-game consequences of, of, of that build. Okay. And kind of at a, so I have, I've kind of a couple, a couple different levels that answer, I think are pretty interesting. So one is at a, you know, a very like kind of object level. So the way that these models work is you basically do have a black box, which takes in some list of numbers and outputs a list of numbers. And, you know, it's very smart and how it does that mapping, but that's what you get. And then you think of this as, this is my primitive. Now what do I build on top of that? So that as little work as possible has to be done inside of the learning here. And a lot of your job is, well, one thing that we noticed that we'd forgotten as well on Monday was, well, it wasn't that we'd forgotten, we just hadn't gone around to it, was I, the passing in data that corresponds to the person was passing in the visibility of a teleport. So as a human, you can see when someone's teleporting out, our bot just did not have that feature.

Brain updating (33:17)

The list of numbers passed in did not have that feature. And so, well, one of the things you need to do is you just need to add it. And so that kind of goes from your feature is, you know, your feature vectors, however long it was. And now it's got one more, one more feature on it. And the bot wasn't recognizing that as an onscreen thing. So it doesn't see the screen. It's passed data from the bot API. Okay. And so it really is given whatever data we give it. Okay. And so it's kind of on us to do some of this feature engineering and you want to do as much as you can to make it as easy as possible so that it has to do as little work inside as possible. So it can spend, you know, you think of it as you've got some fixed capacity. Do you want to spend that on learning the strategy? Do you want to spend it on learning how to like, you know, map, you know, choose which creep you want to hit? Like, do you want to spend that on trying to parse pixels? Like, you know, that, that at the end of the day, that I think a lot of our job as the, the system designers here is to push as much of that model capacity and as much of the learning towards the like interesting parts of the problem that you can't script that you can't possibly do any processing for. And so that's kind of, that's kind of one, one level is that a lot of the work ends up being identifying which features aren't there or kind of engineering the observation and action spaces in an appropriate way. Another thing is, I think is like another level where you zoom out is like the way that this actually happened was so, you know, we're there on Monday and people got dinner and then Shimon and Yak up in Roffel and in Soho and I think, you know, maybe one or two others stayed up all night to do surgery on our running experiment. And so it was very much like a, you've got your production outage and like everyone's there, like all hands on deck trying to go and make the improvements. Yeah. So specifically like to kind of zoom in and to give you a bit of a feel what it felt like working on the world. Like, you know, like this is like very tiring week. Every day we were like the day was just like meeting with the pros and kind of watching our board getting excited and the nights were kind of cutting out the next version of experiment because actually it's a little known fact, but from day to day, like each version of the experiment was not good enough to beat the next player, next day's professional. So just that morning we would download the new parameters of the network and it would be good enough to beat it. But the day before it wasn't, how are you discerning that? Well, this was, this was again, something of almost a coincidence. I mean, there might be something a little bit deeper, but so the, you know, kind of the full story of the week was we did the Monday play and that, you know, there we'd lost to PI cat. And so just to, just to clarify, are you guys in the competition or not in the competition? So the, the thing that we did was we did a special event to play against Dendi, who's one of the best players of all time. And while we were there, we were also like, well, let's test this out against all these other pros. Gotcha. Because they're physically here right now and let's see how we do. Got it. All right. So Monday happens, you start training it. What? Yep. And so, so, so actually, so, yeah, so this experiment we've kicked off, you know, maybe sometime the prior week and. Two weeks before, I think. Something like that. And we'd been running this experiment for a while and our infrastructure is really meant for you running an experiment from scratch.

Brute force tunnelswed day (36:30)

You know, you start from complete randomness and then, and then you run it and then you, you two weeks later go and see how it does. We didn't have two weeks anymore. And so we had to do this surgery and it's very careful. Like, you know, read every single character of your commit to make sure that you're not going to have any bugs. Cause if you mess it up, we're out of time. There's nothing you can do. And it's not one of those things. Like if you're just a little bit more clever that you can, you know, go and do a hot patch and have everything be good. It's just literally the case that you gotta let this thing sit here and it's got a bake. Yeah. And so Monday came and went, we were running this, this, this experiment that we performed surgery on.

Experiments And Techniques

A Simple Experiment (37:06)

And the next day we got a little bit of reprieve where we just played against some, some kind of lower ranked players who, you know, are kind of commentators and popular in the community, but, you know, we're not pushing the limit of our bot. On Wednesday at 1 PM, our contact from, from Valve came by and said, Hey, I'm going to get you our TZ and Sumail who are basically, you know, the top players in the world. And I was like, could we push them off to Thursday maybe? And he was like, their schedule is booked. You're going to get them when you get them. And we're going to, we were scheduled to get them at 4 PM. Okay. So we looked at our bot to see how it was doing. And we kind of been along the way, gauging it. We tested it against our semi-pro player. And he said this bot is completely broken. Oh no. And you know, kind of pictures of maybe we had a bug during the surgery, like went through our head and he showed us the issue. And he said, look, first wave, this bot takes a bunch of damage. It doesn't have to take. There's no advantage to that. I'm going to run and I'm going to go kill it. I'll show you how easy it is. He ran into kill it and he lost. Okay. And then he can, but don't jump ahead. Explain what happened. So, so he played it five times and he lost each time until he finally did figure out how to exploit it. And we realized what was going on was that this bot had learned a strategy of beating. You pretend to be a really dumb bot. You don't know what you're doing. And then when the person comes in to kill you, you just turn around and you go super bot. It was legitimately a bad strategy, you know, if you're really, really good, but I guess it was good against the whole population of bots that it was playing against. And you had never seen it until that day. So yeah, we had not seen that behavior. And we did not at all expected it was like one of the major examples of the things that we kind of didn't have explicit incentive for. And yet the bot actually learned them. And yeah, essentially, I mean, this, this, this kind of funny because of course, when the bot played against its other versions, it was just like good baiting strategy. That was kind of other distribution, but it got very interesting psychological effect on humans because of the strategy was not to fall for the weight. It was kind of the weighted out a little bit because the bot already is at a disadvantage, but he's like, okay, look, look, how stupid this bot is. I'm going to go for a kill. So, so, so it kind of had interesting psychological effect on humans, which I thought was like, it's kind of funny. It almost knows it's a bot. Yeah. It knows how it's attacked. Yeah. It's funny to see a bot, which kind of seems like it's playing with emotions of, of the you have, of course it was not what actually happened, but it seemed this way. So now, so now we were faced with the dilemma. It's 1pm on Wednesday that these best players are going to be showing up at 4pm.

Towards the Ends (39:49)

We have a broken bot. What are we going to do? And we know that our Monday bot is not going to be good enough. We know it's not going to cut it. And so the first thing we do is we're like, well, Monday bot, it is pretty good at the first wave. This new bot is a super bot thereafter. Okay. So can we stitch the two together? So we wrote, so, so we, we already, we had some code for doing something similar. So we kind of revived that. And then in the three hours, Jay spent his time doing a very careful stitch where you run the first bot and then you cut over at the right time to the second bot. And, and this is literally just like bot one plays the first X amount of time. And then literally just that, literally just that.

Stitch up (40:33)

And he finished it 20 minutes before the, before the thing we ran it by our semi-pro. Semi-pro is like, this is great. So we at least we got that done in the nick of time. But the other question was how do we actually fix the spot? And so, I mean, I actually just finished, finished your site as like one aspect because we are also kind of uncertain what happens when you switch over from one bot to the other. So I was actually standing by the pro who was playing it. And I was looking at the timer at the moment when, when it was switching, I was like, distract the guy in case something stupid. And of course it was probably completely unnecessary, but, but we weren't sure what would happen there.

Baiting (41:07)

So I didn't know about that part of the story. So the question of how do you actually fix it? So there was a little bit of debate of like, maybe we should abandon ship on this switch back to our old experiment, run that one for longer. And I forget, I forget who suggested it, but someone was like, I think we just have to run for longer because you learn a strategy of baiting. Well, the counter strategy for that is just don't bait, play well the whole time. And so we got that run for the additional three hours. And so we first played our TZ who showed up on, on our, our switch bot, you know, kind of the, the, the, the Franken bot and you know, that beat him three times and regard it. Let's try out this other bot and just see what happened with the additional three hours of training. Because, you know, our semi-protester at least validated that like, it looks like it's fixed. And so in that three hours of training, how many games is it actually playing simultaneously? It's a good question quite a bit. Yeah. Okay. And, and so we, we played this new bot against our TZ, didn't know how it was going to do and sure enough, it beats him. And, and he, he loved it. He was having a lot of fun. He ended up playing 11 games that day. And maybe maybe it was 10, but I think that he was just like, oh, this is so cool. We were supposed to have Sumail that day as well, but due to his scheduling snafu, he had to be at some panel and so like timing didn't work out. But our TZ and his coach, who also coaches Sumail both said, yeah, both said, Sumail is going to beat this bot. Like, it's going to happen. You know, maybe he'll have a little bit of trouble to figure it out for the first game. But like after that, you're in trouble. And so you're like, all right, we've got one more day to figure out what to do. And so what did we do? I don't know. It's kind of like some nice dinner. Kind of what did you say? Kind of went for some nice dinner. We, we kind of rested a lot. We kind of chatted, you know, like, you know, slack with some people at home. And then in the morning, we download the new parameters of the network and just let it play. So we literally just hung out and just let it go. Just let it play. It's the exact opposite of how I'm used to engineering deadlines happening. Yeah. Normally it's your work right up until the minute. So you guys weren't like, you guys were getting like full nights of sleep, nice and relaxed. Oh, no, no, absolutely not. Okay. So, so to make this clear, the night before the night before that, like two nights before the day where we got the rest of relaxation, the night looked something like the following. Like we had full day of dealing with the pros and kind of like emotional highs and so on. So it's absolutely knackered. Come midnight, we start working. Okay. We need to make all those changes. Like the one thing that we talked about around midnight, we start with four people and we are also tired that like, you know, we looked through all the commits that we are going to add to the experiments. There were actually two people looking at them because we didn't trust a single person given how tired we are. So they're like looking at those coins to six am hours. I was doing this, like updating the model, which is a lot of nasty, like off by one indexing things. Even though it was a short call, it took me like six hours, six hours to do. Somewhere around 3am, we had like a phone call with Azure. Because it turns out that with certain number of machines, you start exceeding some limits. So we tried to make them raise the limits. Around 6am, we are okay. We are ready to deploy this. Deploying is just like one man job. So Yakub was just like, you know, hitting, like, you know, clicking the deploy and kind of fixing all the issues that came up. I was staying around just exclusively to make sure that Yakub doesn't fall asleep. And eventually at 11am, the experiment was running and we kind of went to sleep, like, woke up at 4pm or something. And then it was like all... So it had over 24 hours to train. I think it ended up being like one and a half day until the game with Samayo.

The 48 hour cure (45:13)

Sorry, just to repeat the timeline. So this was Monday, it was when we played first set of games, had the loss, did the surgery that night, you know, played it, I guess starting at 11 on Tuesday. Then Wednesday, 4pm is when we played our teasy and then trained for longer. I don't think we made any changes after that. Maybe we made some small ones. But then on Thursday is when we played Sumail. Okay. And so... I think Tuesday to Wednesday was the night where we made last changes. And there was quite a bit of different work going on that all kind of came together at once. Like one thing I think is really important was... So one of our team members, his handle, Siho, who's a very well-known program and competition competitor, was spending a lot of time just watching the bot play and seeing why does it do this weird thing in this case. You know, what are all the like, you know, weird tweaks and really getting intuitions for, oh, because we're representing this feature in this way. And so if we change it to this other thing, then it's going to work in a different way. And I think that like this really trying to... It's almost this very human-like process of watching this like expert playing the game and try to figure out what are all the little micro decisions that are going into this macro choice. And it's kind of interesting starting to have this very different relationship to the system you build. Because normally the way that you do it is, well, your goal is to have everything be very observable. And so, yeah, you want to put metrics on everything and like, you know, that if something's not understandable, add more logging, like, you know, that's how you design the systems. Whereas here, you have this, you know, you do have that for the surrounding bits, but for this machine learning core, that there you really do have to understand it at more of a sort of behavioral level. Was it ever stumping you where you're just like, oh, it's being creative in a way that we didn't expect it to and it may be even working, but you don't know why or how it decided to make that choice? Yeah, I think the the biting story that we showed was... The biting is the main one. the main story like this. We got a few small ones, like where, you know, there was like some early days of the project where, where like we have professionals, like playing the next edition of the body. And he's like, "Hmm, yeah, but it's really good at crippling." And we are like, "Oh, what is crippling?"

What are the subtext-basis skills (47:28)

I'd say that there is also one other part of the story that I think is interesting. And then I think probably we can wrap up this part. But so to see how well I, you know, our semi-pro tester had played hundreds of games against this bot over the past, you know, couple months. And so we wanted to see just how does he benchmark relative to our TZ. And so we had him play against our TZ and, you know, our TZ was up the whole game. It was just like beating him to the last hit by like 500 milliseconds every single time. And so our semi-pro was like, "All right, I've got one last ditch effort to go try this strategy that the bot always does to me." And it's like some, you know, strategy where you like do something complicated and then you like triple wave the opponent, you get them out of the tower, you have regen, you go in for the kill and he did it and it worked. And this was the bot had like taught him the strategy that you could use against a human. And I think that was like very interesting and a good example of the kinds of things that you can get out of these systems that they can discover these very sort of, you know, non-obvious strategies that can actually be taught to humans. And how did it go with Sumail? So with Sumail, we went undefeated. And I think it was 5-0 that day. One thing that's actually interesting, so we'll probably blog about this in upcoming weeks, but we've actually been playing against a bunch of pros since then. So our bot has been very high demand and some of these pros have been live streaming it. And so we've gotten a better sense of kind of watching as humans go from you know, just being completely unable to beat it to if you play against it for long enough, you can actually get pretty good. And so there's actually a very interesting set of stats there that, you know, we'll be kind of pulling and analyzing in a bit. Are there humans that consistently beat the bot today? Yeah. So I think there's one who has like a 20% win rate or something. I think it might be actually 30. And that player played hundreds of games. And just finds strategies to exploit. No, actually, he becomes essentially as good as the bot. Really? What the bot is doing, which you find extremely surprising, but it turns out that he played hundreds of games with it. So it's actually... And is he a top player? Like, does he beat most humans? These are all professionals. Okay. Like, it's not just some random kid who's good at beating the bot. That's right. The way to think about this is that, like, yeah, I mean, being a professional video game player is a pretty high bar. I think everyone wants to be a professional video game player, who played these games and the number of pros is very small. And there are some who have really like, you know, when you're playing hundreds of games against it, you're gonna get very, very good at the things that it does. And so talking to Arteezy, I was asking him, "Has it changed your play style at all?" And he said, he thinks that the thing that it's done for him is it's helped him focus more. Because like, you know, while you're just there and laying last hitting, now suddenly like, that's just so rote, right? Because you just have been doing it so much, you've gotten so good at it. And I think that one really interesting thing to see is gonna be, how can you improve... Can you improve human play style? Can you change human play style? And I think that we're starting to see some positive answers in that direction. So I know we're almost out of time. I could do like a little lightning round and just quickly go through some of these questions. - Actually, like, to the question of like, what kind of skills you need to work out of. Could we have like a very small... - That was gonna be the first lightning round question. - So a specific list of things that we found very useful, at least in the Delta team, is some knowledge of distributed systems, because we build a lot of those and those are easy to not do properly. And another thing that we found very important is actually writing back free code. Essentially, I know it's kind of taken for granted in computer science community that like, everybody makes bikes and so on, but here it's even more important than other projects that you minimize it because they're very hard to debug. Specifically, many bikes manifest in kind of lower training performance, where to get that number, it takes a day. In like a spree of hundreds of convinces, it's really easy to miss. And primary way of debugging this is actually reading the code. Every bike has very high costs associated with it. So actually writing like this correct bike free code is quite important to us. We sometimes actually kind of sacrifice good engineering habits, good kind of code modularity to make our code shorter and simpler and kind of having less, essentially less lines where you can make bikes. And I guess lastly, as we mentioned, like primary skill is good engineering. But if somebody really feels like, gosh, I really need to brush up my maps, I really need to kind of go in there and feel comfortable, like not have like somebody asked me a question about master. They understand that. Then I think mostly like getting good basics in linear algebra, in linear algebra and in basic statistics. That's especially when doing experiments, it's easy to make like elementary statistics mistakes. And linear algebra is just kind of most of what you need to know to, like basic optimization as well to follow what's happening in those models. But this is kind of compared to being a good engineer, quite easy to pick up, at least in projects like the one we are doing. Yeah. So I wanted to talk about some non-technical skills that I think are really important. So one is that I think that there's like a real humility that's required. If you're coming from an engineering background like I am, working in these projects where you're no longer the technical expert in the way that you're used to. I think that if you go and you build, you talk to like, let's say you want to build a product for doctors. I think you can talk to 10 doctors and honestly, whatever thing you're going to build is probably going to be a pretty valuable addition to their workflow because doctors can't really build their own software tools. Maybe some can, but as a general rule, no. Whereas with machine learning research, everyone that you're working with is very technical, can build their own tools. But if you inject engineering discipline in the right place, if you build the right tool at the right time, if you kind of look at the workflow and think about, oh, we could do it in this other way, that's where you can really add a bunch of value. And so I think it's about knowing when to inject the engineering discipline, but also knowing when not to. And being to Shimon's point, sometimes, well, we really just want the really short code because we're really terrified of bugs. And so that can yield different choices than you might expect for something that's just a peer production system. Who writes the least bugs at all of OpenAI? Oh, that's a contentious question. What's the question? Who writes the least bugs per line of code at all of OpenAI? I'm definitely not going to say me. Possibly Yakub. Yeah. Hard to say. But it's, yeah, it's three. It could be Greg.

Career And Life As A Tech Professional

Being an audio startup worker-Non-technical person? (54:52)

I read a lot of bugs. I called Yakub the least amount of times on bugs. So it's more okay to have bugs that are going to cause exceptions. Right. And my bugs usually cause exceptions. So that's fine. That's fine. Yeah. It's what you don't want is the things that cause correctness issues where it gets 10% worse. Yeah. So there was another question related to skills, but this is for non-technical people. So Tim Biko asks, how can non-technical people be helpful to AI startups? Well, I was going to say, I think one important thing is that for AI generally right now, I think there's a lot of noise and I think it can be hard to distinguish what is real from what's not. I think just like simply educating yourself, I think is like a pretty important thing. Like I think it's very clear that AI is going to have a pretty big impact. And that's just look at what's already been created and extrapolate that without any new technology development, any new research. And it's pretty clear that it's going to be baked into lots of different systems. There are a lot of ethical issues to work through. And I think that being kind of a voice in those conversations and educating yourself, I think is like a really important thing. And then you look to, well, what are we going to be able to develop next? And I think that that's where the really transformative stuff is going to come. Okay. I once saw a post of Greg's Rescue Time Report and was pretty shocked. Do you have any advice for work use for working such long focused hours? I think it's not a good goal. I would not have a goal of trying to maximize the number of hours you sit at your computer. For me, I do it because I love it. And that the thing that the activity that I love most in the world is when you're in the zone, writing code, producing it for something that's meaningful and worthwhile. And so I think that as a second order effect, it can be good.

How do you focus? (56:35)

But I wouldn't say that like that is the way to have an impact. I will also say more specifically, the only way I've ever seen people be super productive is if they're doing something they love. There is nothing else that will sustain you over a long enough period of time. Okay. Is the term AI overused by many startups just to look good in the press? Yes. Indeed. Okay. What is the last job that will remain as AI starts to do everything else? The last human job?

What will be the last human job? (57:06)

Like, what is going to be the hardest thing for AI to do? It's a hard question to answer in general. Because I think it's actually not AI researcher. The AI researcher will go before. Yeah. It's actually very interesting when you ask people this question. I think that everyone tends to say whatever their job is as the hardest one. But I actually think that AI research is going to be one that you're going to want to make these systems very good at doing it. Totally. I think the last question, maybe this is obvious, is can you just connect the dots between how playing video games is relevant to building AGI? Yeah. It's actually maybe one of the most surprising things to me, the degree to which games end up being used for AI research. And the real thing that you want, right, is you really want to have algorithms that are operating in complex environments where they can learn skills and that you want to increase the complexity of those skills that they learn. And that's either you push the environment, you push the complexity of the algorithms, you scale these things up, and that that's really the path that you want to take to building really powerful systems.

How playing video games is relevant to building AGI (57:59)

So games are great because they are a prepackaged environment that some other humans have spent time first of all putting in a lot of complexity, making sure that there's actual intellectual things to solve there, or not even just intellectual, but interesting mechanical challenges that you can get human level baselines on them so you know exactly how hard they are. They're very nice, unlike something like robotics, where you can just run them entirely virtually and that means you can scale them up and you can run many copies of them. And so they're a very convenient test bed and I think what you're going to see is that there's a lot of work that's going to be done in games, but the goal is of course bring it out of the game and actually use it to solve problems in the real world and to actually be able to interact with humans and do useful things there. So I think they're a very good sort of starter and a very good place to, like I think one thing that I really like about this Dota project and bringing it to all these pros is that we're all going to be interacting with super advanced AI systems in the future and right now I think we don't really have good intuitions as to how they operate, where they fail, what it's like to interact with them and this is a very low stakes way of having your first interaction with very advanced AI technology. Cool. If someone wants to get involved with OpenAI, what should they do? Well, we have a job posting at our website. I guess the tips that us giving about how to get a job out of an AI are very geared towards the specific job posting that we have there, which is a large scaling force and learning engineer. Cool.

Job Opportunities At Openai

How to get a job — OpenAI (59:44)

Yeah. In general, we look for people who are very good at whatever technical access they specialize in and we can use lots of different specialties. Great. All right. Thanks guys. Just to echo that, everyone thinks they have to be an AI PhD. Not true. Neither of these guys are. All right. Thanks a lot. Cool. Thank you. Yeah. Thank you.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Wisdom In a Nutshell.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.