Front-End System Design

DOM & Querying

Evgenii Ray

Evgenii Ray

Staff UI Engineer
Front-End System Design

Check out a free preview of the full Front-End System Design course

The "DOM & Querying" Lesson is part of the full, Front-End System Design course featured in this preview video. Here's what you'd learn in this lesson:

Evgenii reviews the global objects and class hierarchy in the DOM API. Query methods are also discussed, and their time complexity, memory cost, and algorithms are compared. In general, getElementById provides the best space and time complexity performance. The querySelector and querySelectorAll have slightly worse performance, but due to caching and browser optimizations, the performance will improve with repetitive runs.

Preview
Close

Transcript from the "DOM & Querying" Lesson

[00:00:00]
>> Evgenii Ray: So what is the DOM API? So the DOM API is a set of methods that we can utilize to manipulate a DOM. And usually, we don't use it directly because most of the libraries are handling the DOM mutations for us, for instance, React, Angular or Vue. But there are cases where we actually need to use the DOM API.

[00:00:21]
And one of that is when you're building some low-level library that works with virtualization or some DOM management. Also, it's used when you need to create some generic components that can be ported to any library or framework. It can be a video player or a chart engine. And also, for instance, you are building your portfolio website, and you want to minimize the blueprint of your application by not importing too many libraries.

[00:00:51]
So this way you can just use the DOM API and your app will be lightweight. So this section will give us a quick refresher on how to use the DOM API. So first, let's overview the objects that are available to us and how can we use the DOM API?

[00:01:11]
So the first global object that we are all aware of is the window. The window is the globally accessible object, so you don't need to import anything. The next one is the document, the document who presents HTML Element, basically, the page that we're currently rendering. And the document can be accessed for the Window or directly by referencing this as a Global Object.

[00:01:40]
Then we have some quick access links that we can use on the Document. It's a body and head gives us the body check and head respectively. Now let's go for the Class Hierarchy. All the elements on the HTML page are represented by HTMLElement. But apparently the HTMLElement doesn't have any DOM API assigned to that.

[00:02:06]
The DOM API is actually assigned to the element prototype. So why is that? Because the DOM API can actually work not with just HTML. So it can be also SVG elements or XML-like things. That's why the DOM API lives on the element prototype instead of HTMLElement. And we also have the Node.

[00:02:29]
The DOM is a tree and somehow, we need to access the tree properties. So that's why we have a special class that's designed to represent a tree leaf, and this is the Node. And we have exception, which is a TextNode, which basically extends the Node plus. The TextNode is used when we use any text, then we need to wrap our text in the HTMLElement.

[00:02:54]
So the browser needs to somehow to represent the text that we place in our in our text. Then we have the HTMLDocument and HTMLDocument is outlier here because it has the DOM API assigned to its prototype, but it extends the Node. I don't know the reason why, but this is the exception in the whole hierarchy that we need to live with.

[00:03:17]
And the next one is the Window. Basically, the Window has an instance of HTMLDocument and it doesn't have any interesting properties on its prototype, so it presents the Browser Viewport. Query methods, let's now understand how the browser queries elements on a page, how the DOM API queries the elements.

[00:03:37]
So let's have first the most simple method, which is getElementByID. So when the browser reads the page, it constructs the Hashmap of the elements. So it reads the ID, and then it basically builds the cache where the ID is assigned to some HTML reference. So this means that when we try to getElementByID the Time Complexity and Read Costs is just O (1).

[00:04:05]
Because this is just basically accessing the object by its key. The situation is slightly different when it comes to getElementsByClassName. So to getElementsByClassName we need to actually traverse the whole tree, and the way the browser traverse a tree it's use the DFS approach. So it basically goes through the elements and finds all the all the elements on a page with the class text.

[00:04:36]
And the Time Complexity for this is linear, so we need to go for all elements in the worst case, but you see there is a star here. This is because the browser is quite smart, and it utilizes the Hashmap when you try to query things. So the first query will take some time, but if you execute the same query next time, it will be almost immediately because browser utilizes the caching mechanism.

[00:05:04]
And one thing about this method is also to trans the HTMLCollection. An HTMLCollection is actually a legacy Collection which has one interesting characteristic, it's life. So this means that if we modify the DOM Tree, and, for instance, remove one element from the DOM, this will be reflected on our HTMLCollection.

[00:05:30]
And that's why the Read Cost of the HTMLCollection is big O(N) and because every time we need to read a single element, the browser will parse the tree again to verify its position. So this is a disadvantage of this thing. But on the other hand, the memory cost is low because the browser doesn't need to clone any objects, it just takes the already existing references.

[00:06:00]
So you can utilize the HTMLCollection when you're low on memory and you need to use as less memory as possible. Okay, the same way works the getElementByTagName. So there is no difference at all, it allows us to query things by detect. So the querySelector works slightly different. So it utilizes the CSS selector, so the time complexity depends on the selector that we use.

[00:06:34]
So we always space some text on compiling the selector. So this means that the more complex selector you have, the more time it takes to compile it for CSS. So the read access and the Memory Cost is big O (1) because we're just returning the single element. And the querying mechanism when we try to use the class names or the text, it's the same as the DFS.

[00:07:04]
If we use the ID, then we gonna convert this statement to getElementByID instead. So one interesting thing about the querySelector, and this is the querySelectorAll, is because it returns the NodeList. The NodeList is not a type of collection, but it's not life, it's just basically the copy of the elements.

[00:07:25]
So what the browser does it just copies the HTML objects. And this means that we pay additional memory costs when we do the querying. So it's not advisable to query too many things because you may end up having the memory spike. But the read access is O(1) because you're just accessing the object directly from the array instead of traversing the tree again by maintaining the life collection.

[00:07:51]
And the querying mechanism is the same, DFS plus Hashmap. Quick summary, so the getElementsByID provide the best performance because the browser builds the cache. But you need to make sure that you utilize the ID space correctly. So it's basically apparent to you to rely on the IDs too much, because the ID name space is shared across your multiple components and just make sure that you don't overuse IDs in your app.

[00:08:23]
The getElementsByClassName or getElementsByTagName, provide a low memory overhead because it returns to live collection with just references to existing objects. But because the read access is high, so you need to make sure that you utilize this collection right. Because if you're going to loop for this collection, yes, it will convert to quadratic time.

[00:08:50]
And we don't want to do that in our app on the largest set of elements. The querySelector have slightly worse performance than getElementByID. But the browser itself heavily optimize any CSS selectors. So in different browsers, you may get different results, but on average, it's very close actually to getElementByID.

[00:09:13]
But the elements that are returned from this query do not represent the live collections. So if you remove the element from the DOM Tree, you may end up having the stale element in your in your collection. Yeah, the same applies to the querySelectorAll, but because it copies all the elements, you don't want to query too many elements with this method.

[00:09:39]
>> Speaker 2: Could you clarify the difference between the time complexity of querying versus the read cost?
>> Evgenii Ray: Yeah, okay. So if we'll go back to getElementsByTagName, for instance. So the time complexity it takes to go through the elements, to find all the elements that we are looking for is big O(N) because we need to go through all elements in the worst case.

[00:10:06]
But the read cost is also big O(N) because we maintain the live collection. And when we maintain a live collection, this means the browser needs to verify that the element exists in the DOM Tree. So every time you're trying to access this live collection, the browser queries the elements again to verify that the element is still present.

[00:10:30]
That's why it can result to the Time Complexity, in the worst case, we'll need to parse all elements again.
>> Speaker 3: Is there a reason the querySelectorAll returns to non-live element lists compared to this one, for example, comparing or returning the live?
>> Evgenii Ray: So the live collection is actually legacy one, it's not recommended to use for normal cases.

[00:10:59]
And I guess the querySelector returns non-live collection because we needed to have some alternatives. The live collection has more drawbacks and if not used wisely, can lead to more issues. But in our application development, we don't really need to manage the live collections, and for most of the developers, the querySelectories the better go because it doesn't have the expensive read access.

[00:11:27]
>> Speaker 2: Why would querySelectorAll impact memory while returning references to Nodes which already exist in the DOM?
>> Evgenii Ray: Not exactly, we actually return the copies. Basically, the browser creates a new object with the same values and the copies, and when you change the properties of these objects, it will impact the original object.

[00:11:56]
But we have two separate objects basically, with the same properties and they're connected with each other for the proxy mechanism. So it's a slightly different, when we use the live collection, we actually guarantee we can actually reuse the real reference to the object instead.
>> Speaker 2: When you're saying that the cache would make things faster next time, are you always sure about that because class names can be dynamic?

[00:12:28]
>> Evgenii Ray: So you can't control that because it's up to the browser engine. The same query methods can behave differently in Safari or Chrome depending on how developers optimize that. For instance, Safari browser uses the WebAssembly compilation for the native selectors, while the Chrome doesn't use that. Because they find that the performance of existing querySelectors is fine on average.

[00:12:56]
So you can't guarantee the same results, and it's up to the engine. So the logic of caching the querySelector is quite complex, and yeah, no guarantee [LAUGH].

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now