Before moving, first will take a short overview on what is process, threads which serves as a backbone for this topic.
A Process is nothing but a container that gets attached on every program that u run on any computer system.
For eg: running a google chrome, running a python code, running a js code, running a game, running a terminal
etc
all are inside a process.
A Process will typically contains your code, heap memory for your program(basically all ur reference types will be stored here like objects,arrays etc), stack(where your functions will be pushed and executed and popped off) etc.
A Thread is a small unit inside a process, thread has its own stack and other stuffs, things like heap, code
will share from the process. Even a process is kinda thread,
What i mean by this is process is a container which by default has one thread.
Thread represent directly with number of CPU cores that u have.
For EG: If you have a CPU with 4 cores, if u launch a program(process) by default one thread will occupy a core,
now u can launch 3more threads to utilize ur CPU to full potential to acheive parallelism.
acheive parallelism => meaning all the 4 cores will be running parallely.
Can i able to spin up only threads that are matching with my CPU cores?
NO, you can have n number of threads, For eg: if u have 4 cores, you can have 100 threads too, but only 4 threads
will really be running parallely, based on the work, the threads will context switch once the work is done(this is
mostly done by cpu scheduler). consider thread1 - core1, thread2 - core2, thread3 - core3, thread4 - core4 running
parallely, if thread 2 done with their job thread5 will be allocated to core2 etc, this process will be
repeated.(this is basically context switching in thread, all these are done by CPU scheduler).
Now we have some understanding of what is process and threads, here we will see how node uses threads.
Whenever u run a node code(node index.js), it will create a process, which inturn will run on a thread
this node process will contain (JS interpreter(V8) + Garbage Collector(for memory management) + libuv(Event loop)
+ stack(to execute functions) + heap(store reference variables) etc).This is how your typical node code will
contain at the process level.
Now lets come down to what are worker_threads:
In Node multithreading is acheived by using a module called worker_threads.
worker_threads are similar to multithreading on how other language handles threads like java, python etc, but with
some variations, typically in languages like java when u create another thread from main thread, the heap memory is
shared among all the others, it will be very complex to manage those, consider u have an object { counter: 0
}, in java all the threads by default can share this same object, a change to this object will reflect in all
the threads, of course this can be handle by some mechanisms like synchronize, but on the whole memory is
shared,
but in Node, worker_threads handle it differently, when u create a new worker_thread(thread), the new thread
will be getting the same as our main thread(JS interpreter(V8) + Garbage Collector(for memory management) +
libuv(Event loop) + heap + stack(to execute functions) etc), the only difference is this lives in the same process
that u run the code.
Consider the following scenario you have 6cores of CPU, u are having two files index.js and worker.js(technically
you can have both main and thread code in same file too), lets say now u are running node index.js,
this will create a process and occupy one thread(core), now lets say in index.js you are creating 5 more workers,
including this total 6cores will be occupied and all will be running parallely.
As i stated earlier, node threads will have separate heap themselves, then how come the data can be passed
around?
This is handled using an event model, where threads can share their data to parent(main thread) or other threads
using events called message channel.
A simple example below
index.js
const WorkerThreads = require('worker_threads')
const arr = new Array(15).fill(0).map((_, i) => i + 1)
const result = []
function main() {
for (let i = 0; i < arr.length; i++) {
const worker=WorkerThreads.Worker('./worker.js')
worker.postMessage(arr[i])
worker.on('message', (resultFromWorker)=> {
result.push(resultFromWorker)
worker.terminate()
})
}
}
main()
worker.js
const WorkerThreads = require('worker_threads')
WorkerThreads.parentPort.on('message', arrayValue => {
const result = someExpensiveOperation(arrayValue)
WorkerThreads.parentPort.postMessage(result)
}
In the above code am simply creating a dummy 16 array value passing it to 16 worker threads to do some
expensive calculation parallely and send the result back to the main thread.
This is how node handles multithreading, mostly you should use threads in node only for some heavy cpu computation
like for eg calculating prime numbers between a range, some very cpu intensive tasks etc.
NOTE: You should not use threads to handle Network IO, file IO etc, since Network IO by default node handles
it with single thread itself using kernel E-poll etc, file IO is handled by libuv library(with libuv handled multi
threading). Technically you can handle network IO also with worker_threads, but it is not designed for that.
This is just a foundational blog about what are threads and how node uses it, play with it and enjoy.