World Wide Web (WWW) Basic Mechanics

Written by Cody Lindley

The Front-End Developer Roadmap from Frontend Masters - Advance your skills with in-depth, modern front-end engineering courses.


Overview:

A closer look at the infrastructure of the World Wide Web.


Definitions:

Web Page
A document with an .html extensions that can be displayed in a web browser. This file can be loaded into a web browser from your local computer, or it can be sent over the internet to a web browser using HTTP. Web pages can link to other web pages. Web pages typically display text, images, videos, audio, and graphical user interfaces.
Website
A group of web pages hyperlinked to each other.
Web Server
Aka, an "HTTP server." A device on the internet (a computer) set up to host and server web pages to other devices (computers, phones, etc.) that request web pages using HTTP. A web server can serve static files as is or dynamic files that are built upon request, typically holding data from a database.

Contribute content, suggestions, and fixes on github:

https://github.com/FrontendMasters/learning-roadmap


The Internet

It is the internet that makes the World Wide Web possible. The internet is a global system of connected devices, routers, and modems.

Internet Network

Image source: http://www.theshulers.com/whitepapers/internet_whitepaper/index.html

People will often think of the internet and the World Wide Web (aka WWW) as interchangeable, but the internet is specifically the layers of networking that exist to make the connection of devices possible. Most people are connected to the internet by paying an Internet Service Provider (aka ISP) for access in the form of a wired connection directly to a business/home or a wireless connection from a cellular tower or satellite.

No one person, company, organization, or government runs the internet.

How the internet works

Image source: http://www.bitrebels.com/technology/find-out-who-runs-the-internet-chart/

Learn more about the internet:

TCP / IP / DNS

The Transmission Control Protocol (aka TCP) is roughly the baseline protocol that is used so that interconnected devices on the internet can communicate with each other. In other words, the internet infrastructure connects devices, but TCP is the baseline language/interface (i.e., protocols) that allow devices (computers, laptops, phones etc.) to speak to each other in a common way (i.e., TCP defines how data travels on the internet).

TCP specifically directs packets (i.e., chunked data) of information to a specific application on a specific device connected to the network that can be identified using an Internet Protocol Address (aka IP address).

Think of an IP address like your home address. Your home address locates your home. IPs addresses locate a specific device in a network. An IP address is written as a numeric string of characters with periods and slashes using CIDR notation (e.g., 192.168.100.14).

Wouldn't it be horrible if every website you wanted to visited you had to memorize an IP address? To avoid this, Domain Name Servers (aka DNS) are used. They are essentially a service that returns an IP address for a domain name. Without domain name servers, a URL to google.com would look something like:


  http://172.217.20.110 (actually works in a browser try it)

instead of:


  http://google.com
  

DNS servers make it so URLs are more human-friendly/readable/writable and can contain semantical meaning (e.g., user enters google.com into a web browser and DNS translates that domain name to http://172.217.20.110 behind the scene for the user). Semantical IP addresses allow for domains to be indexed and searched for in a meaningful way using search engines and keywords.

Learn more about the internet:

HTTP

The internet, TCP, and IP make it possible for one device to exchange data with another device on the network using ones and zeros (i.e., packets of data). On the world wide web, when dealing with web pages, a specific type of data exchange is taking place. Typically a web browser or client/user-agent is asking for data from a specific location/device serving a particular type of data over the network (e.g., a computer running server software on the network/internet). This requires a specific protocol sitting on top of the baseline TCP/IP protocols.

Client Server Stack

Image source: https://odetocode.com/articles/743.aspx

The WWW runs because a client like a web browser can connect to a web server through the internet, and this client can talk to a web server using the HyperText Transfer Protocol (aka HTTP) to exchange data most commonly written in the HyperText Markup Language (aka HTML). Other data besides HTML that be exchanged can include an image, font, audio, video, CSS, and JavaScript files. HTTP also allows for a web browser to parse a .html document with embedded http:// requests (e.g., image files, CSS files, JavaScript files, videos files, audio files) and fulfill those requests when it encounters them. For example, here are all the HTTP requests made when visiting google.com in a web browser (i.e., the default HTML is sent first, then the browser fulfills embedded HTTP requests in the index.html page).

http

Image source: https://www.webpagetest.org/result/190508_NF_4b9572348efa02e76204ff49351a7794/1/details/

When a user enters a URL into a web browser/client (e.g., google.com), an HTTP request is sent over the internet to a web server. Generally, what is sent back from the request is a specific .html document/file containing information written in the HTML that is then used by the web browser/client to visually display information to the user. Think of an HTML document as a very particular kind of text document that can contain text and user interface's, with its own internal HTTP requests to CSS, JavaScript, images, videos, and audio files all wrap/tagged with semantical meaning (i.e., HTML tags like an <h1></h1>) to be displayed to the user. Below is an example of a .html document.


<!DOCTYPE html>
<html>

<head>
  <meta charset="UTF-8" />
  <title>Name of Web Page</title>
</head>

<body>

  <h1>I am a web page</h1>

</body>

</html>

Hypertext Transfer Protocol (aka HTTP) is the webs foundational protocol that sits on top of TCP and IP. In terms of the World Wide Web, HTTP is the foundational mechanism that allows a user to request and send data over the internet (i.e., request files like .html, .jpg, .js, and .css etc.).

A professional front-end developer will have an in-depth understanding of HTTP.

Notes:

  1. HTTP specification.
  2. HTTP is evolving to HTTP/2.

URL's/ Domains / Sub-Domains

What is url

Image source: https://sitechecker.pro/what-is-url/

Browsers use a Uniform Resource Locator (aka a URL) to locate unique resources over the internet on web servers (i.e., HTML files, CSS files, JavaScript files, image files, video files, font's files, etc.).

A URL is broken down into the following parts (Protocol > Domain Name > Path (optional) > Parameters/Query (optional) > Fragment/Anchor (optional)):

Protocol

As you might expect, the protocol of the World Wide Web is HTTP. So a URL will start with:


http://
                    

Domain Name

The next part of a URL is a domain name. The domain name is made up of three subparts separated by a period: The Top Level Domain, label 1, and label 2.


http://label2.label1.topleveldomain

e.g.

http://www.google.com or http://google.com (the label 2 or www is optional)

Typically label 1 is a semantical cue as to the content found at the domain. The Top Level Domain portion is a fixed set of characters, the most common being .com, but nowadays more options are available (e.g., .biz, .org, .us, etc.).

Domains are purchased from a domain host, and once you own a domain name, you can subdivide the domain using label 2. Label 2 is often refereed to as a sub-domain.

     
http://email.google.com (goes to Gmail not google homepage)
              

Path (optional)

If a URL has a valid protocol and domain, then it has located a specific web server. A path is used to request specific files from the web server. Roughly speaking, a path is how you tell a web server to look in X folder and find file Y, and server that file to the web browser (typically, that file is a .html document).


http://label2.label1.topleveldomain/folder1/folder2insideof1/folder3insideof2/file.type
            
e.g., 

http://mydomain.com/about/me.html

or

http://mydomain.com/about/index.html (But could just be http://mydomain.com/about/)
          

Note that paths or directories are indicated by using a single /. A server can be setup up to server default files from a directory/folder but, a path can also end by requesting a specific file from a folder/directory.

If no path is used, then a default file (typically index.html) is returned from the root directory.

Parameters (optional, aka query)

The URL can be used to send data from the URL to the web server. At any point, after the path, if you add ? followed by name=jill&lastname=doe the data name and lastname and their values will be sent to the server as part of the request.


http://mydomain.com/about/?name=jill&lastname=doe
                        

Fragment Identifier (optional, aka anchors)

A fragment identifier can be used to tell the client/browser to adjust the view so that something specific in the HTML document is within the viewport of the browser. Some speak of fragment identifiers as a way of bookmarking and directly linking to a particular part of a web page. To add an anchor to a URL use, # at the end of a URL followed by a name:


http://mydomain.com/contact/?answer=yes#section1.2
                                      

Note that for the browser to adjust the viewport to a specific part of the HTML document, that HTML document has to have an element with an id attribute that matches the name provided in the URL (e.g. <div id="section1.2"></div>).

Notes:

  1. Original URL specification.
  2. Newer URL specification..
  3. When you see in the URL with https:// instead of http:// this indicates that the all communication from the client to the server is using an encrypted http protocol.

Web Hosting

A web hosting service is a type of Internet hosting service that allows individuals and organizations to make their website accessible via the World Wide Web. Web hosts are companies that provide space on a server owned or leased for use by clients, as well as providing Internet connectivity, typically in a data center.

β€” Wikipedia

While it's possible to make a home computer connected to the internet a web server and available on the internet, most people use a web hosting service to serve web pages. Web hosts set up a web server for you (e.g., apache or Node.js) to use as well as the tools needed to connect a domain name to the web server. A popular web hosting service would be bluehost (Not uncommon for a web hosting service to offer domain hosting service as well)

A web hosting service is not strictly required to serve web pages. One can run web server software on their local computer using a local IP address (typically 127.0.0.1) and connect to the server locally using the URL http://localhost:8080 or http://127.0.0.1:8080. Front-end developers typically develop web pages locally using a local web server to facilitate the client-server relationship without having to pay for web hosting or a domain name.

Domain Registrars

Services are available, called Domain Registers that are used to purchase and host domain names. If you want to buy a domain called, pinkandyellowflyingelephants.com you will search for its availability and, if available purchase it using a domain host. Domain hosts provide the settings to send requests to a domain to a specific webserver (i.e., send requests to pinkandyellowflyingelephants.com to X device connected to the internet). A popular domain host would be GoDaddy.