Transcript from the "Sigmoid vs. ReLU" Lesson

[00:00:00]

>> Another interesting activation function is ReLU. ReLU stands for Rectified Linear Units. And why it is so interesting? I think I want to spend just two minutes to draw a couple of things for you to actually explain what ReLU is doing and why it is so cool. So on our new diagram, we will be comparing sigmoid versus ReLU, Rectified Linear Units.

[00:00:36] So sigmoids, if you remember it will just take all of those signals and simply transforming it in something between 0 and 1, that's what sigmoid is doing. ReLU is doing something different. So it also takes all this signals, but it will simply, actually, let's use green. It we'll set it to 0 when our signals are equal to 0 and it will return the same value when it's positive.

[00:01:20] So you can think about it that relu mathematically speaking, oops, sorry, rely, relu = max(x, 0). It's mathematically how the ReLU is defined. And the reason why it is so cool, why it is, So fast actually, it's because when our number is negative, it just should return zero, right?

[00:01:55] And when it's positive, it will return the x itself. So for this function to process on the hardware, what we will need to do, we need to check only one beat. What is the sine of our number? And if the sine is negative, we just simply return back 0.

[00:02:11] If it's positive, we return the number itself. At the same time with a signoid, you have exponent sitting aside, so there should be some lookup function right, to find the proper value. So when you have exponents in your functions, it complicates things. So calculating even one value of sigmoid of some x, will probably require several iterations on the CPU and GPU, but with ReLU, you can do it in one tick.

[00:02:43] And you can paralyze a lot of things. So that's why ReLU is cool and ReLU have been introduced by Google, guys. And the original idea why sigmoid was introduced first, they were afraid that if we have Redstone signal, and don't limit it to once. The sound might be way too big, right?

[00:03:08] For instance, in this case if we using ReLu it's still the case, but we can overcome it by simply switching to higher flow. Basically, we can instead of using 32 bits, per number like in our 32 bit floats, right? Or four bytes per floating point number. We can switch 64 bits for instance.

[00:03:34] So double precision will allow you to extend the dynamic range of the calculations and therefore you can process significantly larger numbers. And the thing is there are other mechanisms which helps you to regulate and not go to very large values. Now, it's scope for the regularization in machine learning, and basically allows you to, all those weights to adjust depending on the amplitude of the signal not get to those really large numbers.

[00:04:04] But you still get the benefit of this really, really fast calculation of relativity.