sh$ node main1.js <div>Will insert: <p>Sylvain's test string</p></div> QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string' ); <div>Inserted: <p>Sylvain''s test string</p></div> # ^^ # Oops: not escaped as HTML!
This section will illustrate why I needed to isolate some modules in their own instance of the V8 JavaScript engine. If you are only focused on spawning new NodeJS processes and establishing a communication between the two V8 instances, you may skip this section.
So, as I said previously, I had an application where I needed to import two modules. Unfortunately, they modified the Object’s prototype in two mutually incompatible manners.
I reduced that use case into the two poorly written modules:
The rogue1
module provides some interface to a hypothetic SQL database.
And the rogue2
module provides some helper functions to produce HTML documents.
Apparently, there is no link between these two modules, and you can genuinely think they will work independently of one another. So, here is some test file to check that:
Now, look what appends when NodeJS execute this code:
sh$ node main1.js <div>Will insert: <p>Sylvain's test string</p></div> QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string' ); <div>Inserted: <p>Sylvain''s test string</p></div> # ^^ # Oops: not escaped as HTML!
The code contains two identical calls to the asParagraph
function provided by the rogue1
module.
Surprisingly enough, those two calls produce two different results! Why that?
If you look closely at the code of the two rogue modules, you will see each one modifies the String’s prototype by attaching a new method.
In one module, to escape the string following the SQL rules.
In the other module, to escape the string following the HTML escape rules.
Unluckily, the authors of both these libraries chose the same name for this new method: $escape
.
It’s not hard to understand that, when loading the rogue2
module, the HTML $escape
method is overwritten by the SQL $escape
method.
And after that, the HTML helper functions provided by rogue1
call the wrong $escape
method and return garbage.
Note
|
Library writers: That’s why you should use symbols to attach new properties to existing objects' prototypes. Symbols are guaranteed to be uniques, and you may have avoided that pitfall by using them. That being said, it is usually not recommended to modify the standard object’s prototypes. Think twice if you plan to do that! |
As noted in the above admonition, the real solution would be to rewrite the modules to avoid attaching a new method to the String’s prototype. But in my case, both modules were large third-party libraries, and a rewrite was out of the question. The solution was to isolate each library in its own V8 instance so they would no longer conflict.
fork
and spawnSync
The bare minimum to do that is illustrated in the following example.
I split the program into two files, main
and worker
, with the goal of running each part in its own process.
The rogue2
module usage being now confined to the worker process, it can no longer conflict with the modules loaded in the main process:
All the code related to the rogue2
module has now moved into the worker
module.
And instead of using it directly, the main program uses the child_process.fork
function to load and run worker.js
in a new independent V8 instance.
Let’s try that:
sh$ node main2.js <div>Will insert: <p>Sylvain's test string</p></div> <div>Inserted: <p>Sylvain's test string</p></div> QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string' );
There are two things to notice here:
First, and that was expected, all calls to asParagraph
from the main process now return identical results. The rogue2
module no longer conflicts with that.
Second, the lines in the output do not appear in the same order as previously: our application is now made of two independent processes running in parallel and scheduled at the goodwill of the operating system. If you consider the overhead of creating and initializing the second V8 instance, the main program has largely the time to execute many instructions before the worker process starts.
Let’s try to slow down things a little by adding delays so we can confirm both processes run in parallel.
NodeJS does not provide a command to pause execution.
So we will use a trick here: using the function spawnSync
, also included in the child_process
module, we will spawn the OS-provided command sleep
to create a delay.
There are two major differences between fork
we used previously and spawnSync
:
Whereas fork
load and execute a JavaScript module in a new V8 instance, spawnSync
allows you to run an arbitrary command just like if it was started from a shell.
spawnSync
, just as the Sync
suffix suggests, will execute the child process synchronously. That means the parent process is suspended until the completion of the child process.
Note
|
If you’re familiar with other programming languages, |
$ node main3.js <div>Will insert: <p>Sylvain's test string</p></div> QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string' ); <div>Inserted: <p>Sylvain's test string</p></div>
Now, if we run the program with the added delays, we can see the same sequence of events as we had initially. Once again, without any conflict between the rogue libraries since they were loaded in independent processes.
Of course, you won’t rely on delays to order the operations in a real application, not to mention it would suspend NodeJS' asynchronous events processing. So we need a way for the parent process to submit a job to the worker process and a way for the worker to signal the completion of a task to its parent. That will be the topic of the next section.
By now, you should be convinced the parent and the child processes run into independent V8 instances. Let’s go a little bit further by rewriting our application to use the event-based IPC provided by NodeJS to establish two-way communication between the parent and the child process.
When using the child_process.fork
function, NodeJS returns an object representing the child process.
This object gives you access, among others, to an event-based communication channel between the parent and the child process.
On its side, the child process can use the process
global object to handle the incoming messages from its parent:
The core work is now done in the message
event listener.
Upon receiving a request from its parent process, the worker process does its job, then sends the result back to the sender.
I wrapped the processing into a setTimeout
call with a random delay to simulate asynchronous operations in the worker process.
Parent’s side, we use a similar technique: an event listener is installed to deal with the worker process' responses. I also added some extra logic to count the number of requests handled to trigger the child process' termination when we’re done.
Once the listener is installed, I send the requests to the worker process using worker.send
.
And it’s done: We can now process data asynchronously while keeping the poorly written modules isolated in their own V8 instance:
sh$ node main-async.js parent sending message [ '0', "Sylvain's test string A" ] parent sending message [ '1', "Sylvain's test string B" ] parent sending message [ '2', "Sylvain's test string C" ] parent sending message [ '3', "Sylvain's test string D" ] worker receiving message [ '0', "Sylvain's test string A" ] worker receiving message [ '1', "Sylvain's test string B" ] worker receiving message [ '2', "Sylvain's test string C" ] worker receiving message [ '3', "Sylvain's test string D" ] worker done processing message 2 parent receiving [ '2', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string C' );" ] parent processing result for message 2 parent <div>Inserted: <p>Sylvain's test string C</p></div> worker done processing message 1 parent receiving [ '1', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string B' );" ] parent processing result for message 1 parent <div>Inserted: <p>Sylvain's test string B</p></div> worker done processing message 3 parent receiving [ '3', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string D' );" ] parent processing result for message 3 parent <div>Inserted: <p>Sylvain's test string D</p></div> worker done processing message 0 parent receiving [ '0', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string A' );" ] parent processing result for message 0 parent <div>Inserted: <p>Sylvain's test string A</p></div> parent terminating worker worker exiting
In the pure textbook tradition, I left error handling aside. But that raises an interesting issue: even if both of our processes are V8 instances, NodeJS cannot propagate the errors thrown from the worker process to its parent. You have to handle that by yourself.
In the preceding section, you have seen the main process responds to the done
message sent by the worker process.
It’s the only message accepted by our very basic example.
But we can extend that to understand a new message: the error
message:
We will wrap the request processing code into a try … catch
block on the worker’s side.
If no exception is raised, we will still respond to the parent process with a done
message.
But if an exception is caught, we will now respond from the catch
block with our new error
message.
To demonstrate the new error path, I also added code to raise an exception when processing the message id-2.
This time, if you follow the log displayed on the screen when you run the program, you can trace what happened to the message id-2, from the error raised in the worker process up to the detection of this error in the parent process:
sh$ node main-async-error.js parent sending message [ '0', "Sylvain's test string A" ] parent sending message [ '1', "Sylvain's test string B" ] parent sending message [ '2', "Sylvain's test string C" ] parent sending message [ '3', "Sylvain's test string D" ] worker receiving message [ '0', "Sylvain's test string A" ] worker receiving message [ '1', "Sylvain's test string B" ] worker receiving message [ '2', "Sylvain's test string C" ] worker receiving message [ '3', "Sylvain's test string D" ] worker done processing message 1 parent receiving [ '1', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string B' );" ] parent processing result for message 1 parent <div>Inserted: <p>Sylvain's test string B</p></div> worker raising error for message 2 Error: worker error at Timeout._onTimeout (/home/sylvain/Projects/Blog/2022/new-v8-instance-from-nodejs/code/worker-async-error.js:20:15) at listOnTimeout (internal/timers.js:554:17) at processTimers (internal/timers.js:497:7) parent receiving [ '2', 'error', 'Error: worker error' ] parent processing worker error for message 2 parent <div>Cannot insert: <p>Sylvain's test string C</p></div> worker done processing message 0 parent receiving [ '0', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string A' );" ] parent processing result for message 0 parent <div>Inserted: <p>Sylvain's test string A</p></div> worker done processing message 3 parent receiving [ '3', 'done', "QUERY: INSERT INTO DICTIONARY(entry) VALUES ( 'Sylvain''s test string D' );" ] parent processing result for message 3 parent <div>Inserted: <p>Sylvain's test string D</p></div> parent terminating worker worker exiting
All the examples given above follow the callback programming style traditionally used with NodeJS.
A good exercise would be to convert that code to a Promise
-based solution (either explicitly or using the async
and await
keywords). As a suggestion, you may also consider using Promise.all
or Promise.allSettled
to terminate the worker process once all requests have been handled.
Don’t hesitate to share your experiments or ask your questions on social networks!
The standard child_process
module provides several ways to spawn new processes from NodeJS, either to run external commands or to load and execute a JavaScript module in an independent V8 instance.
Some of these functions exist both in asynchronous and synchronous forms.
I encourage you to explore the official documentation to learn more about them and see how they allow you to interact with or gather data from the child process.