Node.js + Mongodb - actions with a big amount of data

Node.js + Mongodb - actions with a big amount of data

Recently I need to perform some actions for a big amount of data stored in an mongodb collection.

For the related project we are using nodes.js + mongodb managed with mongoose. Because of this, my first choice was to use the current mongoose schema we have to handle the data in our application.

Just like a test I want to know how much time it takes to read the required data an just iterate over it. To make the test I just iterate over the result and increment a counter variable. When it finish I just print the time expended time.

Here is the best code I can find (other options just take much more time o finished with a "JavaScript heap out of memory") to perform the action (just the important one, other related code has been omitted):

describe.only('for a huge collection data', () => {
  it('should read all elements', (done) => {
    const initialTime = Date.now();  
    let cont = 0;

    const stream = DataModel.find({}).cursor();

    stream.on('data', () => { cont += 1; });

    stream.on('close', () => {
      const totalTime = Date.now() - initialTime;
      console.log(`Executionen end. Number of elements: ${cont}. Elapsed time: ${(totalTime / 1000)} seconds`);
      done();
    });
  });
});

When it runs (Node v6.9.5 + mongo 3.4.7 + mongoose 4.13.11 + mocha 3.5.3), it prints:

Executionen end. Number of elements: 219030. Elapsed time: 94.431 seconds

I think, performing no action it was insane so I want to try with another programming language and for that, I choose Golang.

I wrote a very simple go program to make the same, read all the data in a collection and increment a counter. I use go 1.9.4 and to handle the mongo connection I use "gopkg.in". Here is the code(just the important one):

cont := 0
var result Data // Data is a struct with the defined properties
iter := collectionData.Find(bson.M{}).Iter() // collcetionData is the object to acces the mongodb collection
for iter.Next(&result) {
cont++
}
elapsedFirst := time.Since(start)
fmt.Printf("Executionen end. Number of elements: %d. Elapsed time: %f seconds", cont, elapsedFirst.Seconds())

When it runs, it prints:

Executionen end. Number of elements: 219030. Elapsed time: 4.710689 seconds

Obviously, the difference between the node.js version and the go version is to big to just be a language problem.

I made a little but very important change in my version of node.js. Instead of read mongoose objects, I just want to get a plain mongodb object(json).

describe.only('for a huge collection data', () => {
  it('should read all elements', (done) => {
    const initialTime = Date.now();  
    let cont = 0;

    const stream = DataModel.find({}).lean().cursor();

    stream.on('data', () => { cont += 1; });

    stream.on('close', () => {
      const totalTime = Date.now() - initialTime;
      console.log(`Executionen end. Number of elements: ${cont}. Elapsed time: ${(totalTime / 1000)} seconds`);
      done();
    });
  });
});

The only change was add ".lean()" when perform the "find"operation.

Running this new version under the same conditions that the previous one, the output was:

Executionen end. Number of elements: 219030. Elapsed time: 7.184 seconds

OK, it wasn't so good as the Golang version but the improvement was very huge.

It's clear that the use of mongoose is very helpful to handle data and perform many actions where you are using Node.js and mongodb, but it's also clear that depending on what kind of action and the amount of data you are going to handle , the use of ".lean()" is just a need if you don't want to see you application performance decrease in a very important way.

If anyone has a better way to handle big amount of data with node.js or anything to add to this conclusion, please let me know :-)

To view or add a comment, sign in

Others also viewed

Explore content categories