MongoDB - Safely Upserting Subdocuments Into Arrays – How Would You Do It In SQL? – Wrong: Find/Insert or Update



MongoDB - Safely Upserting Subdocuments Into Arrays – How Would You Do It In SQL? – Wrong: Find/Insert or Update

0 0


mongodb-safe-upserts-array-subdocuments


On Github jthomerson / mongodb-safe-upserts-array-subdocuments

MongoDB - Safely Upserting Subdocuments Into Arrays

The Problem

Question: how do you safely upsert a new translation into the translations array?

Assume that the translations array should not be allowed to have more than one entry with the same language code.

{
   "_id" : ObjectId("abcdefg"),
   "category" : "CategoryName",
   "key" : "YourStringKey",
   "text" : "The text to translate",
   "translations" : [
      {
         "_id" : ObjectId("zyxwvu"),
         "language" : "es",
         "text" : "Spanish translation"
      }
   ]
}
               

How Would You Do It In SQL?

In SQL you'd have your translations as a separate table, maybe something like this:

CREATE TABLE MasterText (
   MasterText INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   Category VARCHAR(30) NOT NULL,
   Key VARCHAR(30) NOT NULL,
   SourceText TEXT
)

CREATE TABLE Translations (
   TranslationID INTEGER UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   MasterTextID INTEGER UNSIGNED NOT NULL,
   TranslationLang VARCHAR(8) NOT NULL,
   TranslatedText TEXT,
   FOREIGN KEY (MasterTextID) REFERENCES MasterText(MasterTextID)

)
                  

SQL - Transactions

You could use a transaction to SELECT, check to see if you needed to insert or update, then do the write operation.

START TRANSACTION
SELECT TranslationID FROM Translations ... (see if the row exists)
// if the row already exists:
UPDATE Translations ...
// if it did not exist:
INSERT INTO Translations
COMMIT
END TRANSACTION
                  

SQL - Insert / Select

Or, if you preferred not to use transactions, you could avoid duplicate inserts by using a INSERT ... SELECT with a LEFT JOIN.

INSERT INTO Translations
       (MasterTextID, TranslationLang, TranslatedText)
SELECT mt.MasterTextID, 'fr', 'French translation'
  FROM MasterText mt
  LEFT JOIN Translations t
    ON mt.MasterTextID = t.MasterTextID
   AND t.TranslationLang = 'fr'
 WHERE mt.Category = 'CategoryName'
   AND mt.Key = 'YourStringKey'
   AND t.TranslationID IS NULL
                  

This ensures you won't insert more than one - even in a race condition. Then you could follow up with an update to finish the "upsert".

SQL - On Duplicate Key Update

Or, if you were using MySQL you could use "ON DUPLICATE KEY UPDATE".

INSERT INTO Translations
       (MasterTextID, TranslationLang, TranslatedText)
VALUES ($masterTextID, 'fr', 'French translation')
    ON DUPLICATE KEY
UPDATE TranslatedText = 'French translation'
                  

This Doesn't Work in MongoDB

Transactions won't work because there are none. So no select/check if you should insert or update/then write. Left joins won't work because there are none. So no insert ... select where the existing row is null. On duplicate key update won't work because that's not a feature for MongoDB either.

So what do you do in MongoDB?

Wrong: Find/Insert or Update

Mongo does not offer transactions, so you can not atomically do a find and then do:

if (exists) {
   update();
} else {
   insert();
}
                  

Wrong: Update With Upsert

Mongo has the following feature:

db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey' },
   { ... your update command },
   { upsert: true }
)
                  

The problem is that the "upsert" functionality in MongoDB is only made for upserting entire documents - not for subdocuments.

Wrong: $push

You can not simply do a $push. If two conflicting processes do a $push (somewhat-) simultaneously, you end up with two translations for the same language - which breaks your application's rules.

$ mongo safeupserts
db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey' },
   { $push: { translations:
      { _id: new ObjectId(), language: 'fr', text: 'French translation' }
   } }
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
                  

This is the same problem you'd have if you did two inserts in SQL, but without a SELECT and a LEFT JOIN to check for duplicates.

Wrong: $push

The race condition of using $push can result in duplicates.

{
   "_id" : ObjectId("abcdefg"),
   "category" : "CategoryName",
   "key" : "YourStringKey",
   "text" : "The source text.",
   "translations" : [
      {
         "_id" : ObjectId("zyxwvu"),
         "language" : "es",
         "text" : "Spanish translation"
      },
      {
         "_id" : ObjectId("mnopqrs"),
         "language" : "fr",
         "text" : "French translation"
      },
      {
         "_id" : ObjectId("ghijlm"),
         "language" : "fr",
         "text" : "French translation"
      }
   ]
}
                  

Wrong: Version Field

Some approaches suggest using a select, then insert or update approach, and relying on a version field to guarantee that you don't overwrite something. For example:

var o = findObject(),
    t = o.translationFor('fr');

if (t) {
   var originalVersion = t._version;
   t._version++;
   t.text = 'French translation'
   // update WHERE DB version = originalVersion
   db.translatablestrings.update(
      { category: 'CategoryName', key: 'YourStringKey',
        'translations.language': 'fr', 'translations._version': originalVersion },
      { $set: { 'translations.$.text': 'French Translation', 'translations.$._version': t._version } }
   )
} else {
   t = new Translation('fr', 'French translation'); // has _version = 1
   db.translatablestrings.update(
      { category: 'CategoryName', key: 'YourStringKey',
        'translations.language': { $ne: 'fr' } },
      { $push: { translations: t } }
   )
}
                  

Right: Push, Check Success, Update

Here's the right approach. It guarantees you won't end up with duplicates even in a race condition.

Start by trying to push the subdocument, but only to a document that doesn't already have a matching subdocument.

db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } },
   { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } }
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
                  

Because there was not already a French translation, we successfully pushed one.

After First Push

If you did one successful push, but there was a race condition and a second operation also did a push, what would happen?

// to prove that this operation can run safely multiple times,
// note what happens when you re-run it:
db.translatablestrings.update(
   { category: 'CategoryName', key: 'YourStringKey', 'translations.language': { $ne: 'fr' } },
   { $push: { translations: { _id: new ObjectId(), language: 'fr', text: 'French translation' } } }
)
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
                  

Nothing was matched because our document already had a subdocument with 'fr' as the language.

Putting "Up" in "Upsert"

So, what if one was already inserted and you need to now do an update?

// as always, we start by trying to push our new or updated translation
// if there's not already a translation for that language:
db.translatablestrings.update( ... $push: { ... } ... }) // same as previous slide
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })

if (result.nMatched === 0) {
   // now we know there's an existing document and we simply
   // do an update directly to the subdocument itself
   db.translatablestrings.update(
      { category: 'CategoryName', key: 'YourStringKey', 'translations.language': 'fr' },
      { $set: { 'translations.$.text': 'French translation' } }
   )
}
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
                  

And it worked! Running it over and over is safe.

MongoDB - Safely Upserting Subdocuments Into Arrays