Two important features that drives analytics in MongoDB are:
In general most of the aggregation framework does not require any global write lock, but Map-reduce needs global write lock when writing reducer results back to existing or new collection.
Since version 2.2, MongoDB added a new variable called nonAtomic, by setting it to true in OUT parameter as shown below, it can skip the global write lock for both merge and reduce OUT operations, but it is not supported for replace operation, even though “replace” is most commonly used MR operation especially when dealing with large record changes.
MongoDB Map-reduce example with all 3 options (replace, merge and reduce)
1 2 3 4 5 |
db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { replace: "mragg"}}); db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { merge: "mragg", nonAtomic: true}}); db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { reduce: "mragg", nonAtomic: true}}); |
As you can see from the below MongoDB source code, MongoDB does acquire global write lock “Lock::GlobalWrite lock” when nonAtomic: true is not specified:
1 2 3 4 5 6 7 8 9 10 11 |
long long State::postProcessCollection(CurOp* op, ProgressMeterHolder& pm) { if ( _onDisk == false || _config.outputOptions.outType == Config::INMEMORY ) return numInMemKeys(); if (_config.outputOptions.outNonAtomic) return postProcessCollectionNonAtomic(op, pm); Lock::GlobalWrite lock; // TODO(erh): this is how it was, but seems it doesn't need to be global return postProcessCollectionNonAtomic(op, pm); } |
That does not end there, as an optimization when existing collection has 0 records, then it simply falls back to replace mode even if one specifies merge or reduce option.
Here is the MongoDB source snippet that does this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
long long State::postProcessCollectionNonAtomic(CurOp* op, ProgressMeterHolder& pm) { if ( _config.outputOptions.finalNamespace == _config.tempNamespace ) return _safeCount( _db, _config.outputOptions.finalNamespace ); if (_config.outputOptions.outType == Config::REPLACE || _safeCount(_db, _config.outputOptions.finalNamespace) == 0) { Lock::GlobalWrite lock; // TODO(erh): why global??? // replace: just rename from temp to final collection name, dropping previous collection _db.dropCollection( _config.outputOptions.finalNamespace ); BSONObj info; if ( ! _db.runCommand( "admin" , BSON( "renameCollection" << _config.tempNamespace << "to" << _config.outputOptions.finalNamespace << "stayTemp" << _config.shardedFirstPass ) , info ) ) { uasserted( 10076 , str::stream() << "rename failed: " << info ); } _db.dropCollection( _config.tempNamespace ); } |
Even though it is a good intention to optimize by removing an empty collection with new one instead of merging/copying the records, but not with the cost of global write lock as it causes much more damage than what it can solve; recently we had enough performance and scalability issues with MongoDB due to global locks; and this is one such case.
Here is the global write lock(W:) on MERGE operation when resulting(mragg) collection did not exist (or empty) in the database from MongoDB log.
1 2 3 4 5 6 7 8 9 10 |
2014-04-04T22:20:39.064-0700 [conn27] command test.$cmd command: mapReduce { mapreduce: "mrtest", map: function () { emit(this.user, this.orders); }, reduce: function (user, orders) { return Array.sum(orders);2014-04-04T22:20:39.064-0700 [conn27] command test.$cmd command: mapReduce { mapreduce: "mrtest", map: function () { emit(this.user, this.orders); }, reduce: function (user, orders) { return Array.sum(orders); }, out: { merge: "mragg", nonAtomic: true } } keyUpdates:0 numYields:5 locks(micros) W:3204 r:861 w:3548 reslen:129 12ms |
Work Around:
One work around is to simply add dummy record to existing collection when it is empty or did not exist and use merge operation, so it can avoid the global lock, for example:
1 2 3 4 |
db.mragg.insert({_id: foo}); db.mrtest.mapReduce(mapFunction, reduceFunction, { out: { merge: "mragg", nonAtomic: true}}); |
Only problem with merge mode is, if you had any old records from previous aggregation that no more exists in new run, will continue to stay in the resulting collection where as any new or existing records will be inserted and/or replaced. So, one has to decide if they need replace or merge operation.
In general replace is more optimal as it can simply replace the existing collection with new one; and one should still stick to REPLACE mode if changes or resulting collection is very big.
Scalability Issues With Global Locks
Applications can’t scale on large concurrent operations when MongoDB starts acquiring/releasing global locks (read or write) for every major database operation. For example, when it comes to map-reduce “replace” operation, it simply needs to drop/truncate the old collection and create a new one from the results; which should be a simple atomic in updating global namespace.
MySQL had similar issues on meta data locking, but it was addressed from MySQL version 5.5 onwards. Hopefully MongoDB can use the same logic to get around from namespace locking.
Example:
Here is the complete Map-reduce example javascript:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
venu@ ~/work/mongo/tests 01:54:20# cat mapred.js // MongoDB Mapreduce Example function format(var1) { print() print("-----------------------") print(var1) print("-----------------------") } var mapFunction = function() { emit(this.user, this.orders); } var reduceFunction = function(user, orders) { return Array.sum(orders); } format("truncating all records") db.mrtest.remove({}) format("populating mrtest objects") db.mrtest.insert({user: 1, orders: 3}) db.mrtest.insert({user: 1, orders: 5}) db.mrtest.insert({user: 1, orders: 18}) db.mrtest.insert({user: 2, orders: 5}) db.mrtest.insert({user: 4, orders: 18}) db.mrtest.insert({user: 3, orders: 2}) db.mrtest.insert({user: 3, orders: 3}) db.mrtest.insert({user: 4, orders: 4}) format("List all records") db.mrtest.find().forEach(printjson) format("Run Mapreduce..") db.mrtest.mapReduce(mapFunction, reduceFunction, { out: {merge: "mragg", nonAtomic: true}}); format("List mapreduce output..") db.mragg.find().forEach(printjson) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
venu@ ~/work/mongo/tests 22:20:45# mongo test < mapred.js MongoDB shell version: 2.7.0-pre- connecting to: test ----------------------- truncating all records ----------------------- WriteResult({ "nRemoved" : 10 }) ----------------------- populating mrtest objects ----------------------- WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) WriteResult({ "nInserted" : 1 }) ----------------------- List all records ----------------------- { "_id" : ObjectId("533fc4ff48589b9811fda697"), "user" : 1, "orders" : 3 } { "_id" : ObjectId("533fc4ff48589b9811fda698"), "user" : 1, "orders" : 1 } { "_id" : ObjectId("533fc4ff48589b9811fda699"), "user" : 1, "orders" : 5 } { "_id" : ObjectId("533fc4ff48589b9811fda69a"), "user" : 1, "orders" : 18 } { "_id" : ObjectId("533fc4ff48589b9811fda69b"), "user" : 2, "orders" : 1 } { "_id" : ObjectId("533fc4ff48589b9811fda69c"), "user" : 2, "orders" : 5 } { "_id" : ObjectId("533fc4ff48589b9811fda69d"), "user" : 4, "orders" : 18 } { "_id" : ObjectId("533fc4ff48589b9811fda69e"), "user" : 3, "orders" : 2 } { "_id" : ObjectId("533fc4ff48589b9811fda69f"), "user" : 3, "orders" : 3 } { "_id" : ObjectId("533fc4ff48589b9811fda6a0"), "user" : 4, "orders" : 4 } ----------------------- List mapreduce output.. ----------------------- { "_id" : 1, "value" : 27 } { "_id" : 2, "value" : 6 } { "_id" : 3, "value" : 5 } { "_id" : 4, "value" : 22 } |
MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
RT @vanuganti: MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
RT @vanuganti: MongoDB Map-reduce How To Avoid Global Locks: https://t.co/B7LVKv5JCi @mongodb
MongoDB Map-reduce How To Avoid Global Locks
https://t.co/xdRJ3W9HsM
MongoDB Map-reduce How To Avoid Global Locks
https://t.co/xdRJ3W9HsM
I think one thing that seems very appealing at first glance is the support of map-reduce operations (http://docs.mongodb.org/manual/applications/map-reduce/) in mongoDB.
Especially compared to MySQL we are very limited when it comes to aggregating data with mongoDB, for mongoDB beginners it might seem obvious that using map-reduce functionality to mimic some of MySQLs aggregating operations is a good idea.
For further help you can go through website http://intellipaat.in/ OR https://www.youtube.com/user/intellipaaat
Hai, this very good .
Hadoop Training in Chennai