Transcript from the "Tensorflow Architecture" Lesson

[00:00:00]

>> Why I'm discussing all of this? It's to explain how the TensorFlow works. So with all of that, a TensorFlow is trying to optimize the utilization of the hardware. So TensorFlow, can actually combine T and F together. And actually even Keras, but Keras is the higher level API.

[00:00:30] So Keras is behind the scenes actually translating or can use TensorFlow directly, right? And TensorFlow, mistakenly, some people think that TensorFlow is written in Python. It's not actually true. TensorFlow is written in C and C++, and using Python, if only I can spell [LAUGH] using Python to simply grab the description of the model, right?

[00:01:04] So a user is describing models in Python. But then this description is actually kinda transformed and using C and C++, Source code for implementation of particular reverberations. And now I need to also tell you about computational graph, comp graph. What TensorFlow version 1 was using by default, it was building computational graph as the default behavior.

[00:01:45] What is computational graph? It's basically all the code you create in Python. It will be rebuilt as part of the computational graphs, with notes being your operations and links between those graphs ease the operation. So for instance, if I'm in Python, just writing a + b, or actually in TensorFlow, it will be tf.constant(a) + tf.constant(b).

[00:02:20] It will create simply node a, node b, right? That's our constants. And it will be additional node the operation of addition. And the link between those will be your dependencies. And the thing is why TensorFlow was doing that and actually still can do that, because it allows you to optimize for a lot of things.

[00:02:48] First of all, when you have all of those operations, for instance, if I want to multiply this by c now, right, I can just put this operation. And then let's say add additional value v, I don't know. Basically, I can figure out what operations I can do in parallel.

[00:03:15] Also, how much memory do I need, right, so for all of those constants. And right now it just simply found one variables, but it might be tensors here. I might actually want to allocate megabytes, if not terabytes of RAM, right, for my tensor. And TensorFlow, when building this computational graph, it can optimize for those operations.

[00:03:42] Because one of the slowest operation is the memory allocation. And when you need to allocate a lot, you do it through the operating system, it will take a lot of time. And what TensorFlow can do, it can just allocate one buffer in the memory and it keep track of it and just put, for instance, your tensor v in some parts, right?

[00:04:06] And when we're done with v, when we don't need it, you can just simply overwrite the space with some other tensors. But at the same time, there will be no additional allocations. We're just using this buffer, the one TensorFlow is keeping track of. And also with all those operations, as I said, TensorFlow can look for the parallelism optimizations.

[00:04:34] So it can figure out if, for instance, we're doing some operations which are not independent. So for instance, I can have additional graph here. Then we can probably just run this part of the graph on core 0. And this part of the graph can be executed on core 1, right?

[00:04:55] Or if we're talking about GPUs, then we probably can just use different cores for those operations as well. Or even if we are talking about heterogeneous programming, we can execute part of the graph on a GPU. And then transition the data when the most kinda, let's say compute intensive parts were calculated, we can transition the result back to CPU.

[00:05:23] So that's what TensorFlow was doing behind the scenes. And for a lot of those compute intensive operations, it's actually using pre-built libraries. So remember, I said that we have this linking part. And TensorFlow are relying on CuDNN. So Cuda was originally, I can probably even call it programming language.

[00:05:57] Although it is heavily based on C, C++, but Cuda was specifically designed to program GPUs, right, NVIDIA GPUs, to be specific. And now NVIDIA also created CuDNN, for instance. That's the heavily optimized library for parallelism and utilization of the GPUs with basic building blocks of deep neural networks.

[00:06:29] So it's doing, it's optimizing. Hopefully I will demonstrate to you later, but a lot of this neural network operations, they're relying on general matrix-matrix multiplication or matrix-vector multiplication. So remember a lot of those connections between the neurons with different weights. So you can have just data as one vector.

[00:06:58] And weights in each layer can be your matrix with v11, v12, and so on, so forth, right? And what you can do, you can just simply take the data and multiply by the matrix. Anyway, there are a lot of details behind this I don't wanna go into. But basically your general matrix-matrix multiplication, GMM [LAUGH] or matrix-vector multiplication, that's what's used as the implementation for your deep neural networks training and inference.

[00:07:41] And CuDNN is providing this functionality on the GPU side. For CPUs, we have MKL, that's the math kernel library created by Intel. But there is backend abstraction which using actually Intel MKL called NumPy. So NumPy, numeric Python package, is what is TensorFlow is using. And if you want to do something simple, I would recommend to get familiar with NumPy.

[00:08:20] Because a lot of mathematical operations, NumPy is the way kinda to link your Python and C. So a lot of math-intensive operations can be implemented quite effectively in C. There is no overhead of object-oriented programming and classes on all of that. So in C, you can do a lot of things pretty fast.

[00:08:46] And NumPy allows you to draw this from the kinda Python representation of the data and use NumPy to just almost get to the bare metal to manipulate your data. And TensorFlow is using NumPy quite a bit a lot. Okay, so let me kinda finalize everything. I said that, yes, TensorFlow is creating computational graph, which simply allows us to optimize a lot of things.

[00:09:17] So it will try to use libraries depending on what architecture you have available. If you have GPUs, then a CuDNN will be linked. Or MKL, NumPy, we'll be using those as well. And basically TensorFlow, what it's doing, it's simply looking at your Python description of the model, right?

[00:09:44] And then figuring out what C routines it should use, optimize C routines to get the best speed out of the hardware you have. So [LAUGH] the reason why I was describing it is to tell you the original purpose of TensorFlow, why it was created, right? And what is happening right now with TensorFlow 2.0, they actually switching their default behavior.

[00:10:19] Previously, creating the computational graph was the default behavior. Right now, the default behavior is eager execution. Eager execution is different. It means that we will have access to all of our variables. And all operations will be executed, not in a lazy form, but right when we have it typed as a Python command.

[00:10:50] It makes debugging your Python code significantly easier, right? And that was a really big benefit of PyTorch. PyTorch had eager execution from the beginning, and that's why it was so convenient for searchers to play around with their new ideas, right? Because they could have just print out what's currently happening with the tensor, right?

[00:11:13] Or what's gonna happen if I normalize or do some sort of transformation on my code, on my tensor, right? So TensorFlow team realized that because of this complexity, because of computational graph, you cannot see the values until the computational graph is created. And there is also, you need to create session, right, in which this computational graph will be executed.

[00:11:43] And then you can debug it, or at least look at the values. So that's why they decided, okay, we can keep the computational graph for the optimization, but the default behavior will be just using eager execution and execute your Python code. But when you're satisfied with your code, you can just use special wrappers, Python decorators, right, or TensorFlow function to just say, okay, I'm done with debugging my code.

[00:12:09] I'm done with eager execution. Switch to computational graph. And then TensorFlow will just look at your Python code, recreate the computational graph. We'll figure out the ways how parts of this graph can be optimized, right, which C codes we can use. So we'll then rely heavily on CuDNN and MKL, NumPy, right?

[00:12:36] And everything will be executed significantly faster. So don't be surprised if your eager executed code on TensorFlow 2.0 is somewhat slow, especially if you tried to execute with the TensorFlow 1, right? It's just because TensorFlow 1 already using all of those optimizations from the beginning. And TensorFlow 2.0 is allowing you the development setup first, and then only transition to the computational graph.