Check out a free preview of the full The Good Parts of JavaScript and the Web course

The "Document Tree Structure" Lesson is part of the full, The Good Parts of JavaScript and the Web course featured in this preview video. Here's what you'd learn in this lesson:

Doug summarizes the document tree and explains the relationship between child, sibling, and parent elements. He also demonstrates how to create a recursive function that can walk through the DOM by following the firstChild and nextSibling nodes.


Transcript from the "Document Tree Structure" Lesson

>> [MUSIC]

>> Douglas Crockford: We talked about parsing and making trees. So this is some HTML text. This is a tree that it expands into and there are some interesting things to observe about this. One is first we've got lowercase here and uppercase here. So when the web first started, the first generation of webmasters typed their markup all in upper case cuz they wanted to stand out.

They wanted make it really obvious what was markup and what was content, and writing it all in uppercase made that clearer. After a few years of that, they got tired of leaning on the Shift key and they decided what the hell and it's all lowercase now. That transition happened just as Brendon was designing the DOM.

And so the DOM had to pick a convention and the convention was let's go with uppercase because that's what people were using at the time. So everything that's lowercase here gets shifted up to uppercase there. And you need to be aware of that because sometimes it doesn't matter but sometimes it does matter.

So you need to be aware. Then there are features in the tree that are not in the text. For example, I did not specify a head tag, but there is a head tag in the tree. So it'll add extra bits to the markup. Another place we'll do that is in a table.

If you don't specify a tbody, it'll stick a tbody in there for you. And that can get you into trouble if you're trying to parse around things, and you'll find levels of content that you didn't expect to find. Then the other thing here is that this is the ideal Microsoft tree.

So Microsoft IE6 would make a tree like this one. W3C under the influence of the SGL community said no, you need to have more stuff than that. For example the whitespace between here and here, which you would ordinarily ignore, that has to go into the tree. Microsoft correctly decided, no we shouldn't do that because that's just a waste of space.

But everybody else did and now Microsoft did, everybody does that. But I went with Microsoft tree because the real tree is too hairy and it's hard to talk about, so I'm going with the simpler tree instead. Then there's some implied hierarchy in this. For example, you've got h1, h2 and it kind of looks like the h2 is subordinate to h1.

And the Ps are subordinate to that, but they're not. They're all at the same level in the tree. So you need to be aware of that. Then on the JavaScript side, in the browser, there is a global variable called document, which is the root of the tree. And document.body is a shortcut for getting to the body node.

There is also a shortcut for getting to the HTML node, which is called document element. Which is your thing can be called HTML right? Cuz it goes to that one. And the reason it isn't is because at the time that document element was created, W3C was planning to kill HTML and they didn't want to leave that evidence in the DOM.

So they went with a longer name, so that no one would know. But that plot didn't work. So this is a subset of the same tree, I turned it sideways to demonstrate the next thing. So each node has pointers to other nodes. For example, each node has a first child and a last child node.

Which points to the children. These are neglected middle children, they don't get pointers. Then each of those P node only has one child, so both pointers point to the same element.
>> Douglas Crockford: Then there are sibling pointers, next sibling and previous sibling when at that way the body will have a sibling relationship with the head tag.

These guys are cousins, and you'll be glad to know there are no cousin pointers.
>> Douglas Crockford: Then there is the parent node pointer which goes up. The body will have a parent node going to html. HTML will go up to the document root. And you might be going lordy that's a lot of pointers.

So if I edit this tree, do I have to update all those pointers, that could be kinda hairy. And the answer is no, that in fact you cannot edit the tree. These are all read only from your perspective, you have to use the DOM API if you want to edit this tree.

And I'll show you that API in a moment. Now if it turns out all you want to do is traverse the tree, for example if you want to visit every node in display order. You don't need all these pointers, right? If you understand how recursion works, you only need a binary tree, right?

You can make that with two pointers. So I can use first child the next sibling with a walk the DOM function. It's recursive function which knows how to do that traversal. And that allows me to do things like implement get elements by name. By passing in a function which will look for names and compile things into a list.

In addition to all of those pointers, each node also has a list of child nodes which is kind of like an array which will have all of the children in it.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now