What happens when you type holbertonschool.com in your browser and press Enter
If you’ve ever wondered what really happens in those seconds when you type in a URL and a web page opens, you’ve come to the right place.
In this article, we will explain the main steps that occur behind the scenes, starting from the point we press Enter to the point the desired website is loaded and appears on our screen.
So let’s begin asking ourselves these questions:
What is a server?
A server is a computer or system that provides resources, data, services, or programs to other computers, known as “clients”, over a network. In theory, whenever computers share resources with client machines they are considered servers. There are many types of servers, including web servers, mail servers, and virtual servers. It is accessible only by a network and it is located in datacenters.
What is a web server?
A web server stores and delivers the content for a website — such as text, images, video, and application data — to clients that request it. The most common type of client is a web browser program, which requests data from your website when a user clicks on a link or downloads a document on a page displayed in the browser. Nginx and Apache are very famous web servers.
The client-server model
The client-server model describes how a server provides resources and services to one or more clients. Examples of servers include web servers, mail servers, and file servers. Each of these servers provide resources to client devices, such as desktop computers, laptops, tablets, and smartphones. Most servers have a one-to-many relationship with clients, meaning a single server can provide resources to multiple clients at one time.
What is an IP address?
Every time you are on the internet, IP (Internet Protocol) addresses are playing an essential role in the information exchange to help you see the sites you are requesting.
Your IP address is a unique identifier, kind of like a mailing address, associated with your online activity. Any time that you use the internet (shopping online, sending emails, streaming TV), you’re requesting access to a specific online destination, and in return, information is sent back to you.
There are 2 types of IP addresses: IP Version 4 (IPv4) and IP Version 6 (IPv6).
IPv4 (IP version 4) addresses are sequences of four numbers (from 0–255), separated by a dot (220.127.116.11).
Under IPv4, there are only 232 possible combinations, which offers just under 4.3 billion unique addresses. Due to the increase of the number of computers and devices on the Internet, we are running out of unique IPv4 addresses.
IPv6 came to the rescue by offering a much bigger number of unique IPs. An IPv6 address is a sequence of six segments of letters and/or numbers (0-F) separated by a semi colon.
How did the web browser find the IP address of www.holbertonschool.com ?
When we type the URL (Uniform Ressource Locator) https://www.holbertonschool.com into a browser (Google, Firefox, Safari, or any browser) and press ‘Enter’, the first thing that the browser is going to do is break down the URL in pieces to get the domain name of the website.
Domain names exist because humans can remember words much better than IP addresses. So, in order to get the IP address of a server, the web browser will check first if its cache contains the IP address of the typed domain name. if the browser doesn’t find the IP, it will next ask the operating system.
Note that, if the website was visited previously by the user, the browser will find the IP in the cache.
The DNS is the Internet’s version of Google Maps. It routes you to your destination. Your computer or your router knows the address of the DNS server. When you type the URL in a browser for the first time, it sends a request to the DNS server, which responds back with the IP address of the web server hosting, for example, holbertonschool.com. This value is usually then cached or gets added into the list of known hosts, so your browser doesn’t have to do this lookup every time.
DNS(Domain Name System) is a database that maintains the name of the website (URL) and the particular IP address it links to. Every single URL on the internet has a unique IP address assigned to it. The IP address belongs to the computer which hosts the server of the website we are requesting to access. For example, www.google.com has an IP address of 18.104.22.168. So if you’d like, you can reach www.google.com by typing http://22.214.171.124 on your browser. DNS is a list of URLs, and their IP addresses, like a phone book is a list of names and their corresponding phone numbers.
The primary purpose of DNS is human-friendly navigation. You can easily access a website by typing the correct IP address for it on your browser, but imagine having to remember different sets of numbers for all the sites we regularly access. Therefore, it is easier to remember the name of the website using a URL and let DNS do the work for us by mapping it to the correct IP.
Let’s look at exactly how a DNS request works:
- A DNS request starts when you try to access a computer on the internet. For example, you type www.holbertonschoo.com in your browser address bar.
- The first stop for the DNS request is the local DNS cache. As you access different computers, those IP addresses get stored in a local repository. If you visited www.holbertonschoo.com before, you have the IP address in your cache.
- If you don’t have the IP address in your local DNS cache, DNS will check with a recursive DNS server. Your IT team or Internet Service Provider (ISP) usually provides a recursive DNS server for this purpose.
- The recursive DNS server has its own cache, and if it has the IP address, it will return it to you. If not, it will go ask another DNS server.
- The next stop is the TLD name servers, in this case, the TLD name server for the .com addresses. These servers don’t have the IP address we need, but it can send the DNS request in the right direction.
- What the TLD name servers do have is the location of the authoritative name server for the requested site. The authoritative name server responds with the IP address for www.holbertonschoo.com and the recursive DNS server stores it in the local DNS cache and returns the address to your computer.
- Your local DNS service gets the IP address and connects to www.holbertonschoo.com to download all the glorious content. DNS then records the IP address in local cache with a time-to-live (TTL) value. The TTL is the amount of time the local DNS record is valid, and after that time, DNS will go through the process again when you request holbertonschool.com the next time.
Now the browser knows the IP address and ready to send a HTTP request to the server.
What is a network protocol?
A network protocol is an established set of rules that determine how data is transmitted between different devices in the same network. Essentially, it allows connected devices to communicate with each other, regardless of any differences in their internal processes, structure or design. Network protocols are the reason you can easily communicate with people all over the world, and thus play a critical role in modern digital communications.
What is HTTP?
The Hypertext Transfer Protocol (HTTP) is the foundation of the World Wide Web, and is used to load web pages using hypertext links. HTTP is an application layer protocol designed to transfer information between networked devices and runs on top of other layers of the network protocol stack. A typical flow over HTTP involves a client machine making a request to a server, which then sends a response message.
For example, when the browser sends a HTTP request, the HTTP verb/method is GET by default, that means, the browser tries to get data from a specified resource in the server.
There are other HTTP verbs or methods, like POST, PUT, HEAD, DELETE. The method POST, for example, is used to send data to a server to create/update a resource.
What is TCP/IP protocol?
TCP/IP stands for Transmission Control Protocol/Internet Protocol and is a suite of communication protocols used to interconnect network devices on the internet. TCP/IP is also used as a communications protocol in a private computer network (an intranet or extranet).
TCP and IP are two separate computer network protocols.
IP is the part that obtains the address to which data is sent. TCP is responsible for data delivery once that IP address has been found.
It’s possible to separate them, but there isn’t really a point in making a difference between TCP and IP. Because they’re so often used together, “TCP/IP” and the “TCP/IP model” are now recognized terminology.
How TCP works
TCP allows for transmission of information in both directions. This means that computer systems that communicate over TCP can send and receive data at the same time, similar to a telephone conversation. The protocol uses segments (packets) as the basic units of data transmission. In addition to the payload, segments can also contain control information and are limited to 1,500 bytes. The TCP software in the network protocol stack of the operating system is responsible for establishing and terminating the end-to-end connections as well as transferring data.
Using TCP, all packets that are sent, are tracked so no data is lost or corrupted in transit. This means TCP is reliable.
Difference between TCP and UDP
Just like TCP, UDP (User Datagram Protocol) is another widely used protocol for sending packets over the Internet. But UDP is not reliable because packets that are sent over the network, are not checked and they may get lost or corrupted. On the other hand, UDP is faster and lighter than TCP.
Having a web server is the basis of any web page. But most sites don’t just want a static page where no interaction is happening, and most websites are dynamic. That means that it’s possible to interact with the site, save information into it, log in with a user name and a password, etc.
This is made possible by the use of one or more application servers. These are software programs responsible for operating applications, communicate with databases through queries and manage user information, among other things. they work behind web servers and will be able to serve a dynamic application using the static content from the web server.
Database and database server
In order to store, extract, manipulate information, we need to have a database and a database server on the server.
A database is a systematic collection of data. They support electronic storage and manipulation of data. Databases make data management easy.
Let us discuss a database example: An online telephone directory uses a database to store data of people, phone numbers, and other contact details. Your electricity service provider uses a database to manage billing, client-related issues, handle fault data, etc.
Let us also consider Facebook. It needs to store, manipulate, and present data related to members, their friends, member activities, messages, advertisements, and a lot more. We can provide a countless number of examples for the usage of databases.
What about database servers?
A database server is a machine running database software dedicated to providing database services. It is a crucial component in the client-server computing environment where it provides business-critical information requested by the client systems.
A database server consists of hardware and software that run a database.
The software side of a database server, or the database instance, is the back-end database application.
The application represents a set of memory structures and background processes accessing a set of database files.
The hardware side of a database server is the server system used for database storage and retrieval.
The load balancer
As we mentioned earlier, websites live on servers. For most website where the traffic is consequent, it would be impossible to be hosted on a single server. Plus, it would create a Single Point of Failure (SPOF), because it would only need one attack on said server to take the whole site down.
As needs for higher availability and security rises, websites started augmenting the number of servers they have, organizing them in clusters, and using load-balancers. A load-balancer is a software program that distribute network requests between several servers, following a load-balancing algorithm. HAproxy is a very famous load-balancer.
There are several load-balancing algorithms, like: round-robin, weighted and least connections algorithm.
Round-robin algorithm: Round-robin load balancing is one of the simplest and most used load balancing algorithms. Client requests are distributed to application servers in rotation. For example, if you have three application servers: the first client request to the first application server in the list, the second client request to the second application server, the third client request to the third application server, the fourth to the first application server and so on.
Weighted Round Robin algorithm: Dealing with different configurations of the servers, the administrator can assign the weight or ratio to the server, depending on the request it can handle. Let say, server A can take 3 requests per second, server B can take 2 requests per second on an average, and server C can take 1 requests per second.
So the load balancer will assign a weight to the server A=3,B=2,C=1.You can see the diagram below.
Least Connections algorithm: Requests are sent to the server having the fewest number of active connections, assuming all connections generate an equal amount of server load.
In this example of url “https://www.holbertonschool.com", we can see that the protocol is https, Not http. So, what is HTTPS?
Hypertext transfer protocol secure (HTTPS) is the secure version of HTTP, which is the primary protocol used to send data between a web browser and a website. HTTPS is encrypted in order to increase security of data transfer. This is particularly important when users transmit sensitive data, such as by logging into a bank account, email service, or health insurance provider.
Any website, especially those that require login credentials, should use HTTPS. In modern web browsers such as Chrome, websites that do not use HTTPS are marked differently than those that are. Look for a green padlock in the URL bar to signify the webpage is secure. Web browsers take HTTPS seriously; Google Chrome and other browsers flag all non-HTTPS websites as not secure.
How Does HTTPS Work?
HTTPS uses an encryption protocol to encrypt communications. The protocol is called Transport Layer Security (TLS), although formerly it was known as Secure Sockets Layer (SSL). This protocol secures communications by using what’s known as an asymmetric public key infrastructure.
When your browser makes an HTTPS connection, a TCP request is sent via port 443.
This type of security system uses two different keys to encrypt communications between two parties:
- The private key — this key is controlled by the owner of a website and it’s kept, as the reader may have speculated, private. This key lives on a web server and is used to decrypt information encrypted by the public key.
- The public key — this key is available to everyone who wants to interact with the server in a way that’s secure. Information that’s encrypted by the public key can only be decrypted by the private key.
A firewall is a network security device that monitors incoming and outgoing network traffic and permits or blocks data packets based on a set of security rules. Its purpose is to establish a barrier between your internal network and incoming traffic from external sources (such as the internet) in order to block malicious traffic like viruses and hackers.
How does a firewall work?
Firewalls carefully analyze incoming traffic based on pre-established rules and filter traffic coming from unsecured or suspicious sources to prevent attacks. Firewalls guard traffic at a computer’s entry point, called ports, which is where information is exchanged with external devices. For example, “Source address 172.18.1.1 is allowed to reach destination 172.18.2.1 over port 22.” In our case, using https, we have to allow the 443/tcp port.
Think of IP addresses as houses, and port numbers as rooms within the house. Only trusted people (source addresses) are allowed to enter the house (destination address) at all — then it’s further filtered so that people within the house are only allowed to access certain rooms (destination ports), depending on if they’re the owner, a child, or a guest. The owner is allowed to any room (any port), while children and guests are allowed into a certain set of rooms (specific ports).
Now, Let’s recapitulate!
First, your browser looks for the IP address using the domain name of the website “holbertonschool.com”. Once it is found, the browser sends a HTTPS request to the servers that are hosting the data. the request gets processed first by the firewall. if it passes the firewall, a secure HTTPS connection is established between the two machines. When your browser makes an HTTPS connection, a TCP request is sent via port 443.
The request is received by the load-balancer which forwards it to one of the servers depending on the configured load-balancing algorithm. The chosen web server receives the request, looks for the wanted files and sends them back in a HTTPS response to the browser.
Finally, the browser receives the packets of data and makes them readable for you.