MongoDB Map-reduce How To Avoid Global Locks

Two important features that drives analytics in MongoDB are:

In general most of the aggregation framework does not require any global write lock, but Map-reduce needs global write lock when writing reducer results back to existing or new collection.

Since version 2.2, MongoDB added a new variable called nonAtomic, by setting it to true in OUT parameter as shown below, it can skip the global write lock for both merge and reduce OUT operations, but it is not supported for replace operation, even though “replace” is most commonly used MR operation especially when dealing with large record changes.

MongoDB Map-reduce example with all 3 options (replace, merge and reduce)

db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { replace: "mragg"}});
db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { merge: "mragg", nonAtomic: true}});
db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { reduce: "mragg", nonAtomic: true}});

db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { replace: "mragg"}});

db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { merge: "mragg", nonAtomic: true}});

db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { reduce: "mragg", nonAtomic: true}});

As you can see from the below MongoDB source code, MongoDB does acquire global write lock “Lock::GlobalWrite lock” when nonAtomic: true is not specified:

long long State::postProcessCollection(CurOp* op, ProgressMeterHolder& pm) {
if ( _onDisk == false || _config.outputOptions.outType == Config::INMEMORY )
return numInMemKeys();
if (_config.outputOptions.outNonAtomic)
return postProcessCollectionNonAtomic(op, pm);
Lock::GlobalWrite lock; // TODO(erh): this is how it was, but seems it doesn't need to be global
return postProcessCollectionNonAtomic(op, pm);
}

long long State::postProcessCollection(CurOp* op, ProgressMeterHolder& pm) {

if ( _onDisk == false || _config.outputOptions.outType == Config::INMEMORY )

return numInMemKeys();

if (_config.outputOptions.outNonAtomic)

return postProcessCollectionNonAtomic(op, pm);

Lock::GlobalWrite lock; // TODO(erh): this is how it was, but seems it doesn't need to be global

return postProcessCollectionNonAtomic(op, pm);

}

That does not end there, as an optimization when existing collection has 0 records, then it simply falls back to replace mode even if one specifies merge or reduce option.

Here is the MongoDB source snippet that does this:

long long State::postProcessCollectionNonAtomic(CurOp* op, ProgressMeterHolder& pm) {
if ( _config.outputOptions.finalNamespace == _config.tempNamespace )
return _safeCount( _db, _config.outputOptions.finalNamespace );
if (_config.outputOptions.outType == Config::REPLACE ||
_safeCount(_db, _config.outputOptions.finalNamespace) == 0) {
Lock::GlobalWrite lock; // TODO(erh): why global???
// replace: just rename from temp to final collection name, dropping previous collection
_db.dropCollection( _config.outputOptions.finalNamespace );
BSONObj info;
if ( ! _db.runCommand( "admin"
, BSON( "renameCollection" << _config.tempNamespace <<
"to" << _config.outputOptions.finalNamespace <<
"stayTemp" << _config.shardedFirstPass )
, info ) ) {
uasserted( 10076 , str::stream() << "rename failed: " << info );
}
_db.dropCollection( _config.tempNamespace );
}

long long State::postProcessCollectionNonAtomic(CurOp* op, ProgressMeterHolder& pm) {

if ( _config.outputOptions.finalNamespace == _config.tempNamespace )

return _safeCount( _db, _config.outputOptions.finalNamespace );

if (_config.outputOptions.outType == Config::REPLACE ||

_safeCount(_db, _config.outputOptions.finalNamespace) == 0) {

Lock::GlobalWrite lock; // TODO(erh): why global???

// replace: just rename from temp to final collection name, dropping previous collection

_db.dropCollection( _config.outputOptions.finalNamespace );

BSONObj info;

if ( ! _db.runCommand( "admin"

, BSON( "renameCollection" << _config.tempNamespace <<

"to" << _config.outputOptions.finalNamespace <<

"stayTemp" << _config.shardedFirstPass )

, info ) ) {

uasserted( 10076 , str::stream() << "rename failed: " << info );

}

_db.dropCollection( _config.tempNamespace );

}

Even though it is a good intention to optimize by removing an empty collection with new one instead of merging/copying the records, but not with the cost of global write lock as it causes much more damage than what it can solve; recently we had enough performance and scalability issues with MongoDB due to global locks; and this is one such case.

Here is the global write lock(W:) on MERGE operation when resulting(mragg) collection did not exist (or empty) in the database from MongoDB log.

2014-04-04T22:20:39.064-0700 [conn27] command test.$cmd command: mapReduce { mapreduce: "mrtest", map: function () {
emit(this.user, this.orders);
}, reduce: function (user, orders) {
return Array.sum(orders);2014-04-04T22:20:39.064-0700 [conn27] command test.$cmd command: mapReduce { mapreduce: "mrtest", map: function () {
emit(this.user, this.orders);
}, reduce: function (user, orders) {
return Array.sum(orders);
}, out: { merge: "mragg", nonAtomic: true } } keyUpdates:0 numYields:5 locks(micros) W:3204 r:861 w:3548 reslen:129 12ms

2014-04-04T22:20:39.064-0700 [conn27] command test.$cmd command: mapReduce { mapreduce: "mrtest", map: function () {

emit(this.user, this.orders);

}, reduce: function (user, orders) {

return Array.sum(orders);2014-04-04T22:20:39.064-0700 [conn27] command test.$cmd command: mapReduce { mapreduce: "mrtest", map: function () {

emit(this.user, this.orders);

}, reduce: function (user, orders) {

return Array.sum(orders);

}, out: { merge: "mragg", nonAtomic: true } } keyUpdates:0 numYields:5 locks(micros) W:3204 r:861 w:3548 reslen:129 12ms

Work Around:

One work around is to simply add dummy record to existing collection when it is empty or did not exist and use merge operation, so it can avoid the global lock, for example:

db.mragg.insert({_id: foo});
db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { merge: "mragg", nonAtomic: true}});

db.mragg.insert({_id: foo});

db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { merge: "mragg", nonAtomic: true}});

Only problem with merge mode is, if you had any old records from previous aggregation that no more exists in new run, will continue to stay in the resulting collection where as any new or existing records will be inserted and/or replaced. So, one has to decide if they need replace or merge operation.

In general replace is more optimal as it can simply replace the existing collection with new one; and one should still stick to REPLACE mode if changes or resulting collection is very big.

Scalability Issues With Global Locks

Applications can’t scale on large concurrent operations when MongoDB starts acquiring/releasing global locks (read or write) for every major database operation. For example, when it comes to map-reduce “replace” operation, it simply needs to drop/truncate the old collection and create a new one from the results; which should be a simple atomic in updating global namespace.

MySQL had similar issues on meta data locking, but it was addressed from MySQL version 5.5 onwards. Hopefully MongoDB can use the same logic to get around from namespace locking.

Example:

Here is the complete Map-reduce example javascript:

venu@ ~/work/mongo/tests 01:54:20# cat mapred.js
// MongoDB Mapreduce Example
function format(var1) {
print()
print("-----------------------")
print(var1)
print("-----------------------")
}
var mapFunction = function() {
emit(this.user, this.orders);
}
var reduceFunction = function(user, orders) {
return Array.sum(orders);
}
format("truncating all records")
db.mrtest.remove({})
format("populating mrtest objects")
db.mrtest.insert({user: 1, orders: 3})
db.mrtest.insert({user: 1, orders: 5})
db.mrtest.insert({user: 1, orders: 18})
db.mrtest.insert({user: 2, orders: 5})
db.mrtest.insert({user: 4, orders: 18})
db.mrtest.insert({user: 3, orders: 2})
db.mrtest.insert({user: 3, orders: 3})
db.mrtest.insert({user: 4, orders: 4})
format("List all records")
db.mrtest.find().forEach(printjson)
format("Run Mapreduce..")
db.mrtest.mapReduce(mapFunction, reduceFunction, { out: {merge: "mragg", nonAtomic: true}});
format("List mapreduce output..")
db.mragg.find().forEach(printjson)

venu@ ~/work/mongo/tests 01:54:20# cat mapred.js

// MongoDB Mapreduce Example

function format(var1) {

print()

print("-----------------------")

print(var1)

print("-----------------------")

}

var mapFunction = function() {

emit(this.user, this.orders);

}

var reduceFunction = function(user, orders) {

return Array.sum(orders);

}

format("truncating all records")

db.mrtest.remove({})

format("populating mrtest objects")

db.mrtest.insert({user: 1, orders: 3})

db.mrtest.insert({user: 1, orders: 5})

db.mrtest.insert({user: 1, orders: 18})

db.mrtest.insert({user: 2, orders: 5})

db.mrtest.insert({user: 4, orders: 18})

db.mrtest.insert({user: 3, orders: 2})

db.mrtest.insert({user: 3, orders: 3})

db.mrtest.insert({user: 4, orders: 4})

format("List all records")

db.mrtest.find().forEach(printjson)

format("Run Mapreduce..")

db.mrtest.mapReduce(mapFunction, reduceFunction, { out: {merge: "mragg", nonAtomic: true}});

format("List mapreduce output..")

db.mragg.find().forEach(printjson)

Here is the test output:

venu@ ~/work/mongo/tests 22:20:45# mongo test < mapred.js
MongoDB shell version: 2.7.0-pre-
connecting to: test
-----------------------
truncating all records
-----------------------
WriteResult({ "nRemoved" : 10 })
-----------------------
populating mrtest objects
-----------------------
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
WriteResult({ "nInserted" : 1 })
-----------------------
List all records
-----------------------
{ "_id" : ObjectId("533fc4ff48589b9811fda697"), "user" : 1, "orders" : 3 }
{ "_id" : ObjectId("533fc4ff48589b9811fda698"), "user" : 1, "orders" : 1 }
{ "_id" : ObjectId("533fc4ff48589b9811fda699"), "user" : 1, "orders" : 5 }
{ "_id" : ObjectId("533fc4ff48589b9811fda69a"), "user" : 1, "orders" : 18 }
{ "_id" : ObjectId("533fc4ff48589b9811fda69b"), "user" : 2, "orders" : 1 }
{ "_id" : ObjectId("533fc4ff48589b9811fda69c"), "user" : 2, "orders" : 5 }
{ "_id" : ObjectId("533fc4ff48589b9811fda69d"), "user" : 4, "orders" : 18 }
{ "_id" : ObjectId("533fc4ff48589b9811fda69e"), "user" : 3, "orders" : 2 }
{ "_id" : ObjectId("533fc4ff48589b9811fda69f"), "user" : 3, "orders" : 3 }
{ "_id" : ObjectId("533fc4ff48589b9811fda6a0"), "user" : 4, "orders" : 4 }
-----------------------
List mapreduce output..
-----------------------
{ "_id" : 1, "value" : 27 }
{ "_id" : 2, "value" : 6 }
{ "_id" : 3, "value" : 5 }
{ "_id" : 4, "value" : 22 }

venu@ ~/work/mongo/tests 22:20:45# mongo test < mapred.js

MongoDB shell version: 2.7.0-pre-

connecting to: test

-----------------------

truncating all records

-----------------------

WriteResult({ "nRemoved" : 10 })

-----------------------

populating mrtest objects

-----------------------

WriteResult({ "nInserted" : 1 })

-----------------------

List all records

-----------------------

{ "_id" : ObjectId("533fc4ff48589b9811fda697"), "user" : 1, "orders" : 3 }

{ "_id" : ObjectId("533fc4ff48589b9811fda698"), "user" : 1, "orders" : 1 }

{ "_id" : ObjectId("533fc4ff48589b9811fda699"), "user" : 1, "orders" : 5 }

{ "_id" : ObjectId("533fc4ff48589b9811fda69a"), "user" : 1, "orders" : 18 }

{ "_id" : ObjectId("533fc4ff48589b9811fda69b"), "user" : 2, "orders" : 1 }

{ "_id" : ObjectId("533fc4ff48589b9811fda69c"), "user" : 2, "orders" : 5 }

{ "_id" : ObjectId("533fc4ff48589b9811fda69d"), "user" : 4, "orders" : 18 }

{ "_id" : ObjectId("533fc4ff48589b9811fda69e"), "user" : 3, "orders" : 2 }

{ "_id" : ObjectId("533fc4ff48589b9811fda69f"), "user" : 3, "orders" : 3 }

{ "_id" : ObjectId("533fc4ff48589b9811fda6a0"), "user" : 4, "orders" : 4 }

-----------------------

List mapreduce output..

-----------------------

{ "_id" : 1, "value" : 27 }

{ "_id" : 2, "value" : 6 }

{ "_id" : 3, "value" : 5 }

{ "_id" : 4, "value" : 22 }

Tags: How to avoid global locks with MongoDB Map-reduce performance MongoDB MongoDB Analytics MongoDB Map-reduce Global Lock

8 Comments

@vanuganti says:

April 5, 2014 at 1:42 am

MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
vanuganti says:

April 5, 2014 at 1:42 am

MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
vamshi4001 says:

April 5, 2014 at 2:13 am

RT @vanuganti: MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
@vamshi4001 says:

April 5, 2014 at 2:13 am

RT @vanuganti: MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
@osserjp says:

April 7, 2014 at 12:44 am

MongoDB Map-reduce How To Avoid Global Locks
https://t.co/xdRJ3W9HsM
osserjp says:

April 7, 2014 at 12:44 am

MongoDB Map-reduce How To Avoid Global Locks
https://t.co/xdRJ3W9HsM
a20r says:

July 23, 2014 at 1:34 am

I think one thing that seems very appealing at first glance is the support of map-reduce operations (http://docs.mongodb.org/manual/applications/map-reduce/) in mongoDB.

Especially compared to MySQL we are very limited when it comes to aggregating data with mongoDB, for mongoDB beginners it might seem obvious that using map-reduce functionality to mimic some of MySQLs aggregating operations is a good idea.

For further help you can go through website http://intellipaat.in/ OR https://www.youtube.com/user/intellipaaat
Vignesh Waran says:

July 28, 2014 at 2:34 am

Hai, this very good .
Hadoop Training in Chennai

MongoDB Map-reduce How To Avoid Global Locks

Work Around:

Scalability Issues With Global Locks

Example:

8 Comments

Leave a Reply Cancel reply

Recent Posts

Categories

Recent Comments

Archives

Work Around:

Scalability Issues With Global Locks

Example:

Share this:

Related posts:

8 Comments

Leave a Reply Cancel reply

Recent Posts

Categories

Recent Comments

Archives