Transcript from the "CPU Q&A" Lesson
>> Is RAM then a huge array?
>> RAM is exactly a huge array, yes. That's the way to think about it.
>> Are the A0, A1 in this case part of the registers or the ALU he was talking about earlier?
>> They are part of the registers in the sense that you can read and write into them.
[00:00:21] But certain registers have specialized roles. What I mean by that is, if you have a number and you wanna multiply it and the number happens to be an A0, A0 usually cannot be part of a multiplication operation. Right so you have to move it from A0 to one of the registers that know how to multiply etc.
[00:00:41] So you can think of it as these registers are for the most part general purpose but some know how to do certain things while others do not know how to do certain things right? So the the address register the why the reason why it's called A it's because it's really good at adding things together and using that as a offset into the memory, right?
[00:01:02] So if you wanna do that operation, you can't really do it out of an R0 register. R0 register is not really good at that. So typically CPUs have specialized registers. Like program counter is an example of highly specialized register that you can read and write into. But you do it through a special set of instructions and it has automatic behavior, it automatically increments and so on.
[00:01:28] Same thing with a stack pointer. So yeah, they're all registers, but they're specialized in the sense that certain registers can only do certain operations.
>> Does it mean there's a tone of general registers just a few.
>> Yeah, so typical modern CPU. I don't know it's been a while since I done this, but I would guess that we're talking on the order of like 60 general purpose registers.
[00:01:51] On the order of like 24 these address index registers and then a handful of programmed counters, stack pointer, flag registers and so on. And that's what you get. There's also super specialized registers, which I think modern CPUs have called SIMD, single instruction multiple data, where you say, like, I wanna do a multiplication, right?
[00:02:15] But I don't just wanna do a multiplication on like two numbers I wanna do on a matrix, right? So like, let me load 50 different values into specialized 50 registers. And now do multiply by five, all of them all at once, right? So these are super specialized things and they're useful for really graphics and these days AI, right?
[00:02:40] But once you get into AI, because you think about what AI is, AI is whole bunch of multiplications, right? Because you're doing statistics is really what it comes down to. And so when you get to the AI world, there you have like super specialized hardware that can just rip through tons and tons of data, in terms of multiplications.
[00:03:02] So the idea is, if you just wanna multiply things over and over and over again like a huge matrix, then you know what the set of instructions are, right? So you don't have to have a program counter that goes read and all these other machinery. Instead, you have a specialized pipeline that just gets reads the memory in sequence.
[00:03:19] And then multiplies it and then spits it out to a different location memory, right, and can just process this in extremely fast things. And so if you look at modern CPUs, they have a dedicated neural nets or whatever you wanna call it, right? All it is is a big array of multipliers.
[00:03:37] And these arrays are really good at, you set them up with the initial parameter saying, this is where the A part of the matrix is, this is where the B part of the matrix is, go. And then, the matrix might be 50,000 euros or something like that. And so it just rips through them without any sort of sub routines or or any other additional things.
[00:03:54] It just knows how to do that and knows how to do it really well.
>> So these are hardware differences, the registers, then.
>> Hardware differences also differences of architectures, right? So when you're talking about a Intel CPU, the instructions, numbers have a special meeting, and also the registers have a certain layout, right?
[00:04:17] The instructions assume certain register layout. So part of the reason why you can't just take a code from Intel and run it on an M1 chip without translation right is because the M1 chip is a RISC chip. And because of that it has different register layout. And obviously it has a different set of instructions have different meaning, right?
[00:04:39] So nothing really fits inside of these two architectures like the CPUs are different, the layout of the registers are different. And so things just don't work. And that's why you have to really compile it separately. Yes.
>> Is that still what the case is?
>> The engine absolutely, yes. So the VA has basically via generates these instructions, right. So what I wanna show you here in this first phase of the kind of the presentation is I wanna show you what the hardware level is.
[00:05:44] And how do these tricks work? And then once you see how these tricks work, it becomes totally obvious why certain things are fast or some of the things are slow.