On Github jthomerson / mongodb-safe-upserts-array-subdocuments
Question: how do you safely upsert a new translation into the translations array?
Assume that the translations array should not be allowed to have more than one entry with the same language code.
{ "_id" : ObjectId("abcdefg"), "category" : "CategoryName", "key" : "YourStringKey", "text" : "The text to translate", "translations" : [ { "_id" : ObjectId("zyxwvu"), "language" : "es", "text" : "Spanish translation" } ] }
In SQL you'd have your translations as a separate table, maybe something like this:
CREATE TABLE MasterText ( MasterText INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY, Category VARCHAR(30) NOT NULL, Key VARCHAR(30) NOT NULL, SourceText TEXT ) CREATE TABLE Translations ( TranslationID INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY, MasterTextID INTEGER UNSIGNED NOT NULL, TranslationLang VARCHAR(8) NOT NULL, TranslatedText TEXT, FOREIGN KEY (MasterTextID) REFERENCES MasterText(MasterTextID) )
You could use a transaction to SELECT, check to see if you needed to insert or update, then do the write operation.
START TRANSACTION SELECT TranslationID FROM Translations ... (see if the row exists) // if the row already exists: UPDATE Translations ... // if it did not exist: INSERT INTO Translations COMMIT END TRANSACTION
Or, if you preferred not to use transactions, you could avoid duplicate inserts by using a INSERT ... SELECT with a LEFT JOIN.
INSERT INTO Translations (MasterTextID, TranslationLang, TranslatedText) SELECT mt.MasterTextID, 'fr', 'French translation' FROM MasterText mt LEFT JOIN Translations t ON mt.MasterTextID = t.MasterTextID AND t.TranslationLang = 'fr' WHERE mt.Category = 'CategoryName' AND mt.Key = 'YourStringKey' AND t.TranslationID IS NULL
This ensures you won't insert more than one - even in a race condition. Then you could follow up with an update to finish the "upsert".
Or, if you were using MySQL you could use "ON DUPLICATE KEY UPDATE".
INSERT INTO Translations (MasterTextID, TranslationLang, TranslatedText) VALUES ($masterTextID, 'fr', 'French translation') ON DUPLICATE KEY UPDATE TranslatedText = 'French translation'
So what do you do in MongoDB?
Mongo does not offer transactions, so you can not atomically do a find and then do:
if (exists) { update(); } else { insert(); }
Mongo has the following feature:
db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey' }, { ... your update command }, { upsert: true } )
The problem is that the "upsert" functionality in MongoDB is only made for upserting entire documents - not for subdocuments.
You can not simply do a $push. If two conflicting processes do a $push (somewhat-) simultaneously, you end up with two translations for the same language - which breaks your application's rules.
$ mongo safeupserts db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey' }, { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } } ) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
This is the same problem you'd have if you did two inserts in SQL, but without a SELECT and a LEFT JOIN to check for duplicates.
The race condition of using $push can result in duplicates.
{ "_id" : ObjectId("abcdefg"), "category" : "CategoryName", "key" : "YourStringKey", "text" : "The source text.", "translations" : [ { "_id" : ObjectId("zyxwvu"), "language" : "es", "text" : "Spanish translation" }, { "_id" : ObjectId("mnopqrs"), "language" : "fr", "text" : "French translation" }, { "_id" : ObjectId("ghijlm"), "language" : "fr", "text" : "French translation" } ] }
Some approaches suggest using a select, then insert or update approach, and relying on a version field to guarantee that you don't overwrite something. For example:
var o = findObject(), t = o.translationFor('fr'); if (t) { var originalVersion = t._version; t._version++; t.text = 'French translation' // update WHERE DB version = originalVersion db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey', 'translations.language': 'fr', 'translations._version': originalVersion }, { $set: { 'translations.$.text': 'French Translation', 'translations.$._version': t._version } } ) } else { t = new Translation('fr', 'French translation'); // has _version = 1 db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } }, { $push: { translations: t } } ) }
Here's the right approach. It guarantees you won't end up with duplicates even in a race condition.
Start by trying to push the subdocument, but only to a document that doesn't already have a matching subdocument.
db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } }, { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } } ) WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
Because there was not already a French translation, we successfully pushed one.
If you did one successful push, but there was a race condition and a second operation also did a push, what would happen?
// to prove that this operation can run safely multiple times, // note what happens when you re-run it: db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } }, { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } } ) WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
Nothing was matched because our document already had a subdocument with 'fr' as the language.
So, what if one was already inserted and you need to now do an update?
// as always, we start by trying to push our new or updated translation // if there's not already a translation for that language: db.translatablestrings.update( ... $push: { ... } ... }) // same as previous slide WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 }) if (result.nMatched === 0) { // now we know there's an existing document and we simply // do an update directly to the subdocument itself db.translatablestrings.update( { category: 'CategoryName', key: 'YourStringKey', 'translations.language': 'fr' }, { $set: { 'translations.$.text': 'French translation' } } ) } WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
And it worked! Running it over and over is safe.