I am currently exploring on low-level networking and Operating system related concepts, particularly on how backend and OS stuffs are inter-related.
While going through this i was so fascinated to read about on how efficiently Node.js handles concurrency with the help of Kernel
To answer this question,
let’s first review some basic concepts that are essential for understanding on how typical Backend application works and how Node.js manages network connections and concurrency.
TCP (Transmission Control Protocol) is a Layer 4 protocol used to transmit data between devices on a network. It is an upstream layer of the IP protocol (Layer 3) and provides a stateful-connection-oriented communication method.
The operating system’s kernel itself manages the implementation, maintaining, and terminating these TCP connections between source and destination.
The TCP handshake is a process that ensures a valid connection is established between a source and destination before any data transfer can happen.
It’s a three-step process:
-> SYN: The client sends a SYN (synchronize) packet to the server.
-> SYN/ACK: The server responds with a SYN/ACK.(synchronize/acknowledge) packet to confirm the request.
-> ACK: The client sends an ACK (acknowledge) packet back to the server to finalize the connection.
Once these three steps are completed, we can confirm that the TCP connection is established and both the source and destination can exchange data.This connection is stateful, meaning both sides keep track of the connection’s status throughout its lifetime.
A single-threaded is a process that executes only one instruction at a time.In other words, a program in a single thread can only perform one task at a time. When you’re working with a single-threaded system or language,such as JavaScript in Node.js, there’s only one thread executing your code. It can’t “do” more than one thing at the same time.
In a multi-threaded environment,a single process can spawn multiple threads.These threads shares the same memory of the process and run in parallel which can handle different tasks simultaneously.
For example, consider two tasks (heavyProcessA and heavyProcessB),a multi-threaded system could run them in parallel, using different CPU cores,rather than having to wait for one to finish before starting the other.
Here’s an example in pseudocode:
threadA: heavyProcessA();
threadB: heavyProcessB();
In this example, heavyProcessA() and heavyProcessB() can execute Parallelly.
By default, Node.js is a single-threaded environment. which means that Node.js can processes only one operation at a time. However, there are some exceptions. Certain operations, such as crypto, file I/O, and DNS resolution, are given to a thread pool managed by libuv (the library that Node.js uses for asynchronous I/O operations).
By default, Node.js has a thread pool of 4 threads, but this can be configured up to a maximum of 1024 threads(though the exact maximum may depend on the system). The number of threads in the thread pool can be controlled via the UV_THREADPOOL_SIZE environment variable.
Example: Consider the following code
const { pbkdf2 } = require("crypto");
console.log(`currently running threads: ${process.env.UV_THREADPOOL_SIZE}`);
const start = (task) => {
const st = Date.now();
pbkdf2("SECRET", "salt", 1000000, 64, "SHA512", () => {
console.log(`Finished running task ${task} in ${(Date.now() - st) / 1000} seconds`);
});
};
start(1);
start(2);
start(3);
start(4);
From the above output you can clearly confirm, that crypto is utilizing the libuv threads from the Node.js.
Before going into that, Below is an high-level overview steps of how a typical backend server handles a request and response:
-> consider a backend application listens for incoming connections on a specific IP address and port (e.g., 1.1.1.1:8080).
-> server kernel now creates a socket to listen on the port(8080) and manages two queues: the SYN queue and the Accept queue for this port.
-> client sends an HTTP request to the server on port 8080 at the specified. address. This involves a SYN request from the client.
-> server kernel receives the SYN request, places it in the SYN queue, and responds with a SYN/ACK to the client.
-> client replies with an ACK packet.
-> server kernel matches the ACK with the corresponding SYN in the queue, completes the TCP handshake, and moves the connection from the SYN queue to the Accept queue.
-> BE application now retrieves the accepted connection (via the accept() system call) and removes the connection from the Accept queue.
The accepted connection is returned in the form socket descriptor or file descriptor to BE, which can be observed by running the following command in linux:
subsequent read and write operations to this connection are handled by the operating system kernel, while the backend application only needs to make system calls (e.g., accept(), read(), write()) to interact with the connection.
Node.js uses a long-running event loop(kinda while loop) to manage incoming requests and network connections. This event loop runs continuously as long as the application process is active and constantly polls the operating system (kernel) for updates on connections, such as when data is available to read, or when the connection is ready to be written to.
This way Node.js manages concurrency using a single thread, while utilizing the operating system’s event-driven I/O mechanisms, like epoll etc that allows it to handle thousands of concurrent network connections efficiently.
This is just a drop in the ocean, There is much more to learn about the underlying mechanisms of how Node.js handles this and the wonders that OS is doing for us to make our lives easier. Some topics we can explore further include:
Understanding of File descriptors in linux.
How node uses Epoll system call to handle this asynchronous IO?
How the event loop is keep running after our synchronous code?
How epoll/kqueue kinda stuff works in the kernel level?
...and much more