MIT 6.S191 (2018): Faster ML Development with TensorFlow
Transcription for the video titled "MIT 6.S191 (2018): Faster ML Development with TensorFlow".
Note: This transcription is split and grouped by topics and subtopics. You can navigate through the Table of Contents on the left. It's interactive. All paragraphs are timed to the original video. Click on the time (e.g., 01:53) to jump to the specific portion of the video.
SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO-CHING LIU, CREATOR, TENSORFLOW SHANXING LUO- All right? Great. OK, so my focus will be how to more efficiently write and debug machine learning models in TensorFlow. So the question is whether you need to debug a machine learning model. I think the answer is yes. Of course, machine learning models are very different from traditional programs. But they are software, and they are code. And if they're software and code, they will have bugs. And you need to debug your models from time to time. So hopefully after this lecture, you will know a little bit more about how to more efficiently debug your machine learning models in TensorFlow.
In-Depth Analysis Of Tensorflow Models
Representing a model (00:49)
So before we dive into debugging, I want to talk about how machine learning models are represented in a computer. Because that turns out to be important for how you write and debug your programs. So there are two ways in which a machine learning model can be represented. So it's either a data structure or a program. So if it's a data structure, then when you write code to, for example, define a layer of neural network, you're actually building a graph. And those lines of code, when they're executed, they don't actually do the computation. They're just building a graph. And the lines of code, when they're executed, they don't actually do the computation. They're just building a graph. And the graph needs to be later fed into some kind of machinery, some kind of execution engine, to actually run the model. And the second way in which you can define a machine in a model is to write it as a program. So that's more straightforward. So those lines of code will actually do the computation on either the CPU or GPU, depending on whether you have a GPU or not. So the first paradigm is also called symbolic execution or deferred execution. And the second one is also called eager execution or imperative execution. So now the question for you is whether TensorFlow is the first paradigm or the second paradigm. So I heard someone say first. Second? Both? Yeah. So I think it's a trick question, right? So the answer is both. So if you ask the question like half a year ago, then the answer will be only the first. But in the latest version of TensorFlow, we support both modes. And I'm going to give some examples in the following slides.
Executing a model (02:23)
So by default, TensorFlow is the first mode. So that's the classical traditional TensorFlow style. So just to give you a refresher of how to use TensorFlow to define a simple model, you import TensorFlow as TF. And then you define some constants or maybe some variables as inputs. And then you write a line to say, you want to multiply x and w. And then you want to add the result of the multiplication to another thing, b. So you can think of this as a very simple linear regression model, if you will.
Example of a Basic Linear Regression Model (02:55)
Now the important thing here is when this line is executed, it's actually not doing the computation. So the multiplication will not happen at this point. If you print the result of this line y, then you will see it's not 40. It's not 10 times 4 equals 40. Instead, it's like an abstract, symbolic thing. So it's called a tensor. And it knows what kind of operation it needs to do when it's actually executed in the future. So mal is that operation. It also knows information about what its dependencies are, which are x and w in this case. But it's not shown in the print message here. And likewise, when you do a tf.add, when that line of code is executed, the addition will not happen. It's going to happen later. So by later, I mean the point at which you create a session by calling tf.session. And when tf.session is created, it will basically automatically pull in the graph you have already built in the previous lines of code. And then you tell the session which tensor, which abstract symbol in the graph you want to execute. And then it's going to basically analyze the structure of the graph, sort out all the dependencies, and topologically execute all the nodes in the graph to do the multiplication first and then do the addition next. And it's going to give you the final result, which is 42. So you can think of tf.session as an NG. So it's going to run the model on CPU. If you only have a CPU, it's going to run the model on GPU if you have a GPU. Now, obviously, this paradigm of defining a model on GPU if you have a GPU. Now, obviously, this paradigm of defining a model is not the most straightforward, because those lines of code that look like doing computation is not doing any actual computation. And you need to learn a new API called tf.session. So why does TensorFlow do it this way? So obviously, it's because there are some advantages you can get. So the first advantage is because the model is a data structure, it's relatively easy to serialize this and then deserialize this somewhere else. You can train your model, and then you can load your model onto some kind of other devices, like mobile devices or embedded devices like Raspberry Pi or a car or a robot. And you can also serialize the model and then load the model on a faster hardware, like Google's TPU. So these things are hard to do if your model is a Python, if your model is a Python program.
Advantage 1: Serialize and Load On Other Hardware (05:14)
Because those devices may not have Python running on them. And even if those devices have Python running on them, that's probably not what you want to use, because Python is slow sometimes.
Advantage 2: Portality Across your Products Stack (05:27)
So I have those links here in the slides. So I'm going to send those slides to the course organizers. And you can click on those links if you're interested in any of those topics, like deployments on mobile devices and so on.
Advantage 3: Distributed Training (05:41)
So the next advantage is because your model is a data structure, you are not tied down to the language in which the model is defined. So nowadays, most machine learning models are written in Python, but maybe your application server, maybe your web server is running Java or C++, and you don't want to rewrite the whole stack in Python just to be able to add some machine learning to your stack. So if a model is a data structure, then you can save the model after training, and then you can load it into Java or C++ or C sharp or any of the supported languages. And then you will be ready to serve the trained model from your web server or application server. And the other nice thing about representing the model as a data structure is you can distribute the model very easily onto a number of machines called workers. And those workers will basically use the same graph. They're going to do the exact same computation, but they're going to do it on different slices of the training data. So this kind of training in a distributed way is very important for cases in which you need to train a model on a very large amount of data quickly, the kind of problem that Google sometimes has to deal with. So of course, you have to slightly modify your model graph so that the shared things, like the weights variables in the model, are shared on a server called parameter server here. But that's basically distributed training intensive flow in a nutshell.
Distributed Training Flaws (07:04)
So again, if you're interested, you can look at the slide and you can click that link to learn more about distributed training. Any questions so far? Okay. Okay, so also, because you are representing your model as a data structure, you are not limited by the speed or the concurrency of the language in which the model is defined. We know that Python is slow sometimes. And even if you try to make Python work parallelized by writing multi-threading, you will run into the issue called Python Global Interpreter Lock. And that will slow your model down, especially for the kind of competition that a deep learning model needs to do. So the way in which we solve this problem in symbolic execution is by sending the model as a data structure from the layer of Python into C++. So there, at the layer of C++, you can use true concurrency. You can fully parallelize things, and that can benefit the speed of the model. So obviously, there are all those advantages, but there are also those shortcomings of the model. So obviously, there are all those advantages, but there are also those shortcomings of symbolic execution. For example, it's less intuitive. It's harder to learn. So you need to spend some time getting used to the idea of you're defining a model first and then run the model later with tf.session. And it's harder to debug when your model goes wrong. That's because every actual computation happens inside tf.session, and that's a single line of Python code calling out to C++, so you can't use usual kinds of Python debugger to debug that. But I'm going to show you that there are actually very good tools in TensorFlow that you can use to debug things that happen in tf.session. And another shortcoming of symbolic execution is that it's harder to write control flow structures. So by that, I mean structures like looping over a number of things or if-else branches. So the kind of thing that we encounter every day in programming languages. But some machine learning models also need to do that. So recurrent neural networks need to loop over things. And some kind of fancy dynamic models need to do if-else branches and so on. I'm also going to show some slides to show those concrete examples. So it's very hard to, it's sometimes very hard to write that kind of control flow structures in symbolic execution, but it's much easier in eager execution. So with eager execution, your program can be more pythonic and it's easier to learn and easier to read. So here's an example. So on the left, you're seeing the same code as before. So we are using the default symbolic execution of TensorFlow. Now, how do we switch to the new eager execution? So you just add two lines of code. You import the eager module, and then you call a method called enable eager execution. And you don't have to make any other change to your program in this case. But because of these two lines, you changed the semantics of these two lines, multiply and add. So now instead of building a graph, this line is actually doing the multiplication of 10, 4. And if you print y, you will see the value. And if you print the value of z, you will also see the value. So everything is flatter and easier to understand. Now, as I mentioned before, eager mode also makes it easier to write control flow dependency and dynamic models. So here's an example. So suppose you want to write a recurrent neural network, which I think you have seen in previous parts of the lecture before, in the default mode of TensorFlow. Here is about the amount of code you need to write. So you cannot use the default native for loop or while loop in Python. You have to use the TensorFlow's special while loop. And in order to use it, you have to define two functions, one for the termination condition of the loop and one for the body of the loop. And then you need to feed those two functions into the while loop and get tensors back. And remember those tensors are not actual values. You have to send those tensors into session.run to get the actual value. So there are a few hoops to jump through if you want to write an RNN from scratch in the default mode of TensorFlow. But with EcoExecution, the things become much simpler. So you can use the native for loop in Python to loop over time steps in the input. And you don't have to worry about those symbolic tensors or sessions that are run. The variables you get from this for loop is the result of the computation. So eager mode makes it much easier to write the so-called dynamic models.
Static and dynamic model changes (11:25)
So what do we mean by static models and dynamic models? So by static models, we mean models whose structure don't change with the input data. And I think you have seen examples like that in the image model sections of this lecture. So the diagram here shows the inception model in TensorFlow. So the model can be very complicated. It can have hundreds of layers, and it can do complicated computation like convolution, pooling, and dense multiplication, and so on. But the structure of the model is always the same, no matter how the image changes. The image always has the same size and the same color depth. But the model will always compute the same. I mean, it will always do the same computation regardless how the image changes. So that's what we mean by static model. But there are also models whose structure change with input data. So the recurrence neural network we have seen is actually an example of that. And the reason why it changes is because it needs to loop over things. So in the simplest kind of recurrence neural network, it will loop over items in the sequence, like words in a sentence. So you can say that the length of the model is proportional to the length of the input sentence. But there are also more complicated changes of the model structure with input data. So some of the state of the art models that deal with natural language will actually take a parse tree of a sentence as the input. And the structure of the model will reflect that parse tree. So as we have seen before, it's complicated to write while loops or control flow structures in the default symbolic mode. Now if you want to write that kind of model, it gets even more complicated. Because there, you will need to nest conditional branches and the while loops. But it's much easier to write in the eager mode, because there you will need to nest conditional branches and the while loops. But it's much easier to write in the eager mode because you can just use the native for loops and while loops and if else statements in Python. So we actually have an example to show you how to write that kind of models that take parse trees as input and the process natural language. So please check that out if you want. You have a question? The tree is static. But the tree is static. . Yeah, the tree is static in this particular input, but you can have a longer sentence, right? The grammar of the sentence, the parse of the sentence can change from one sentence to another, right? So that will make the model structure change as well. So basically, you can't hard code the structure of the model. You have to look at a parse tree and then do some kind of if-else statements and while loops in order to turn the parse tree into the model.
Eager mode (14:03)
So we have seen that the eager mode is very good for learning and debugging and for writing control flow structures. But sometimes you may still want to debug TensorFlow programs running in the default symbolic mode. And there are a few reasons for that. First, you may be using some kind of old code of TensorFlow that hasn't been ported to eager mode yet. And some high-level APIs you might be using like tflearn or Keras or tfslim have not been ported to eager mode yet. And you may want to stick to the default symbolic mode because you care about speed, because eager mode is sometimes slower than the default mode. So the good news is that we have a tool in TensorFlow that can help you debug a TensorFlow model running in the tf.session in the default mode. And that tool is called tfdbg, or TensorFlow Debugger. So the way in which you use it is kind of similar to the way in which you use eager execution. You import an additional module. And after you have created the session object, you will wrap the session object with a wrapper. In this case, it's called local command line interface debug wrapper. So you don't need to make any other change to your code, because this wrapped object has the same interface as the unwrapped object. But basically, you can think of this as like an oscilloscope on some kind of instrument on your tf.session, which is otherwise opaque. So now, once you have wrapped that session, when session.run is called, you're going to drop into the command line interface.
TensorFlow Debugger (15:30)
You're going to see basically a presentation of what intermediate tensors are executed in the session.run and the structure in the graph and so on. So I encourage you to play with that after the lecture. So the tf.debugger is also very useful I encourage you to play with that after the lecture. So the TF debugger is also very useful for debugging a kind of bugs in your machine learning models, which will probably occur. Those are called numerical instability issues. So by that, I mean sometimes values in the network will become nans or infinities. So nans stands for not a number. Nans and infinities are like bad float values that will sometimes happen. Now, if you don't have a specialized tool in TensorFlow, it can be difficult to pinpoint the exact node which generates the NANDs and infinities. But the debugger has a special command with which you can run the model until any nodes in the graph contains NANDs or infinities. So in our experience, that happens quite often. And the most common causes of nens and infinities are underflow and overflow. So for example, if there's underflow, then a number will become zero. And when you use that number in division or log, you will get infinities. And overflow can be caused by learning rates being too high or by some kind of bad training example that you haven't sanitized or pre-processed. But the debugger should help you find the root cause of that kind of issues more quickly. So the TF debugger is a command line tool. It's nice. It's low footprint. You can use it if you have access to a computer only via a terminal. But obviously, it's going to be even nicer if we can debug TensorFlow models in a graphical user interface. So I'm excited to tell you about a feature of TensorFlow that's upcoming. So it's called TensorBoard Debugger Plugin, or Visual Graphical Debugger for TensorFlow. So it's not included in the latest release of TensorFlow, which is 1.5, but it's coming in the next release, 1.6. It's available for preview in nightlies. So you can copy and paste the code from here, install those nightly builds of TensorFlow and TensorFlow board, and you can use the feature.
TensorBoard Debugger Plugin (17:30)
So after you have installed these packages, you can run a command. So all the code in my slides are copy-paste executables. Yeah, so these are about the main features of the upcoming tool. So if you're interested, please copy and paste these codes and try it out. This slide here is just a reminder of the main features in here. Okay, so as a summary, we see that there are two ways to represent machine learning models in TensorFlow or in any deep learning framework, either as a data structure or as a program. If it's a data structure, then you will get symbolic execution. And symbolic execution is good for deployment, distribution, and optimization. And if it's a program, then you will get eager execution. It's good for prototyping, debugging, and writing control flow structures. And it's also easier to learn. And currently, TensorFlow supports both modes, so you can pick and choose the best mode for you depending on your need. And we also went over TensorFlow debugger, both the command line interface and the browser version, which will help you debug your model more efficiently. So with that, I'm going to thank my colleagues on the Google Brain team, both in the Mountain View headquarters and here in Cambridge. Chi and Mahima are the two collaborators with me on the TensorFlow debugger plugin project. And TensorFlow is an open source project. There have been over 1,000 contributors like you who have fixed bugs and contributed new features. So if you see any interesting things that you can do, feel free to contribute to TensorFlow on GitHub. If you have questions, please email me. And if you see any bugs or feature requests about TensorFlow or TensorBoard, you can file the bugs on these links. Thank you very much for your attention.