Everything You'll Need to Know About Git

Commit Graph: Tree & Parent



Everything You'll Need to Know About Git

Check out a free preview of the full Everything You'll Need to Know About Git course

The "Commit Graph: Tree & Parent" Lesson is part of the full, Everything You'll Need to Know About Git course featured in this preview video. Here's what you'd learn in this lesson:

ThePrimeagen demonstrates how to explore the contents of a commit using the `git cat-file -p` command. He explains the concept of a tree, which represents a directory, and a blob, which represents a file. Git stores the entirety of each file in every commit, rather than just the changes.


Transcript from the "Commit Graph: Tree & Parent" Lesson

>> I will now do this, so I'm gonna go like this, git cat-file-p and print it out. Look at this, it has a tree, what's a tree? It has the author, it has the committer, it has the time, it has 600 on it. It has my friendly message involved right here.

Where's our commit though? This doesn't tell me anything. So, there's another sha here, so I'm just gonna grab that sha. Whoopsies, and I'm gonna just put it in. Look at this. Again, something that looks like file permissions, blob, another sha, and look at that. The name of the file we just got done adding.

So it kind of looks like we can kind of walk through git, and we can find the file that is stored. So I'm gonna go here now, and I'm gonna do the exact same thing. I'm going to print out that and look at that, hello friend and masters.

I was able to go all the way down and drill to the exact point of the change we've made by starting with the git shot. This is very, very important. Now, there's some really good key terms here that you need to know. First off, you'll notice that there's a tree, right?

You see the term tree being printout. The easiest way to think about tree is, a tree is a directory. It contains a set of files or more directories. A blob, what you saw right here is a file. And so that's the kind of the mentality you should have if you ever look or peruse through these commits.

And we will actually use this information later and restore stuff that we've lost by knowing this kind of information. So anyways, there we go. We were able to go all the way down to the bottom. So that means our sha contained a tree. And that tree contained our file.

And that file contained the entirety of that file, not just our changes, but the entirety of the file. So here we go. Very key concepts. Make sure you know that. Big takeaway, git does not store diffs and we're gonna prove that, it stores the entirety of stuff. And so, I want you to just take a moment, and remember this sha maybe by the first couple characters, mine's EC3, you just remember yours.

This is the sha to the file. This is a very important part right here, all right? So, we're gonna do something else. I want you to create a second file and I want you to commit it, and call it second.md. Whatever text you want, just create something. So I'll do the exact same thing.

All right, fantastic. Hopefully everyone mostly got that done. Just make it a quick, easy change. By the way, if you're using some sort of command line utility, or you're using your editor if you're using VS Code or Vim, don't use the git client. Right now, we're gonna stay all in the command line.

I want you to be there for a reason, because you wanted to understand what's actually happening. Because when you use something like Fugitive for Vim, it's literally just translating your actions into these command line arguments. So I just want you to understand what's happening here. All right, explore your new commit.

What can you say about first.md in your new commit? Cause we didn't touch first.md, did we? We didn't, we just simply added a new file and then committed it. So, can you find first.md in your new commit? By the way, a little fun fact you get the first seven characters right here whenever you commit.

You can actually copy that and go git cat file- p, throw that in there, all right? Notice something a little different here. We have a tree. Again, remember, this is your folder, but we also have a parent. And notice that that is the sha from my first commit.

It will be the sha from your first commit. You can start seeing how this graph is being born. Your first one doesn't have the parents, the next one does, and from here on out they'll all have parents or multiple parents. Second, let's take this tree, and I'm gonna cat this out.

Get cat, there we go. Now look at that. I actually have out. I actually committed out. Remember when I showed you the log, I actually didn't look at what I was doing, I actually commit that out. The mistakes were made. But either way, you see this right here.

You see that I have both my first and now my second. But remember this very important part. Look at what first is. It's the same sha that it had in the previous commit, meaning our commits store pointers to the entire contents of that file. So we did not update first, therefore, this next commit contains first, but it contains a pointer to the exact version.

That means at any point in time, you only need one commit to rebuild your repo. You can completely rebuild everything about it, because that is how it works. Is that every single commit stores the entirety of the state, not a differential. This is a very important point about git, because it just helps you understand git fundamentally.

An so, as you use merge and rebase and all these other commands, it's just good to know this. Does this generally make sense? Even if you don't understand how the compression works? How does it look up stuff? Do you at least understand that git stores these files, stores them as a bunch of these really long strings of characters?

And as long strings of characters contain every last change of your repo, question.
>> So the first two characters of commits sha is then placed where those blobs are for each file so that that points to the commit that it was last updated in?
>> Effectively, yes and so you can see right here, if I look at that, EC.

If I go ls git/objects, you'll see EC. So you actually see that it literally a storing each thing, and then it builds it up. Every tree contains effectively one or more blob is the way to look at it. That's why you can't commit an empty folder. Have you ever created a folder and then nothing shows up?

That's why, because the tree has to have something.
>> And Richard says, is this like a linked list?
>> That's a good way to think about it. It's like a linked list. The problem about a linked list thought is that a linked list contains a node that points to the next node.

But git, you can have more than one parent. It's an acyclic graph that happens to be linear. It's probably the best way to look at. It is that there isn't an ironic order to get which is time, right? So as time marches forward, every commit you make is ordered that way.

That's actually an extremely important point to make sure you have in your head, cuz when we do bisecting time is a very important aspects to it and why bisecting works very, very well. And if you've never used I'm very sorry if you've ever had to find when a bug happened and you didn't know that this tool exists.

So we'll go over it. Hopefully this does make git feel less magical though. I want git to feel magical, because it works so well, not because it's mysterious. And that's where you want magic to happen. You want it to be a great product, not mysterious that it actually works.

Sometimes it's mysterious that it actually works 'cause you programmed it and you're shocked that you can build anything that big. That's a different story. The previous chief product officer at Netflix said, I'm surprised Netflix even works at all, right? Because at the end of the day, it's a whole bunch of people building something and somehow it all works together.

So it's always fantastic. Remember, every single program you'll ever write can be broken down to a bunch of if statements, for loops, and variable declarations. That's it, that's all you're doing. Really, if you really break it down, it's just a bunch of go-to statements and procedural code. That's all you write.

And so no matter how fancy, no matter how much elixir you use, it's all just really simple go-to statements at the end of the day, all right? This is just for you guys. I'm gonna do a little quick thing right here. I'm going to go like this. Make dir, how about source?

Get status, get status. You'll notice nothing shows up. I'm gonna go Vim source and we're gonna create a file in here. How about index.js? For all my TS, we're just so awesome, right? All right, okay, I'm gonna have an empty file in there. Git status, you'll see that it's right there.

Git out of this, git status git commit foo. I'll do foo again. And I have this right here. And I'm gonna go git cat file -p, and print it out. Again, parent previous commit, tree. I'm gonna go into the tree, get cat file -p tree. Look at this, we got our blobs.

But there you go, another tree right here and that tree is a folder. So that means if I take this tree and go get cat file, that tree you'll see index.ts. It's literally just storing a bunch of pointers describing the state of git, just really to drive that home.

Just so everyone really understands, you could probably build a command line tool that cats out a commit, parses out these strings. And you could probably rebuild the repo yourself just from knowing how to use git cat-file. You could actually rebuild the entire state of it by just walking blobs and trees and making directories.

It's really not that magic, just feels magic.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now