In my detailed post on NoSQL data modeling, I listed down ways of modeling NoSQL data (using mongodb collections). For one-to-one relation, usually, it is not apparent why one needs a separate collection instead of embedding everything in single document.
In this post I’ll address this and share an example.
Keeping our example real-world but simple, let’s model a university or school. A few obvious entities emerge:
And here are minimum attributes required for each (not including obvious fields like id, createDate, updateDate etc. )
firstName lastName dob email password education batch CGPA enrolDate
firstName lastName dob email password degrees experience bio joinDate
firstName lastName dob email password certificates vocationalTraining experience joinDate
firstName lastName dob email password experience joinDate
firstName lastName dob email password weapon experience joinDate
How To Model?
Now we’ve listed down the possible entities, we need to actually model them. Let’s see a few possible ways:
All Entities Have Their Own Models/Collections
This way each of the above listed entities have their own models. It seems to be a good choice at first but reveals a problem on further analysis. When application logic is written around this design, and especially when we have a single point of entry into the system (same login page / API with no parameter to identify role) for all kinds of users, it requires us to search each of the five collections for email/password combination to find out which user has logged in (and perhaps take them to their own dashboard)
All Entities In A Single Collection
Note that many of the fields listed above are common across all entities. That includes email and password, which are credentials for login. This commonality offers us an easy solution to the above problem: We can merge all the current entities into one collection, say Person or User, and keep another field userType (or userRole) that tells the user type.
firstName lastName dob email password education batch CGPA certificates degrees experience joinDate enrolDate vocationalTraining weapon bio userType
It saves us the trouble of identifying user type and logging-in difficulty by merging common fields like firstName, lastName, email, password etc in to one collection. At the same time, however, it makes application management hard. Because most of the other fields are exclusive to specific user types, such as, only student has batch and enrolDate; only professor has degrees; only guard has weapon; and so on. That’s a lot of tracking and management.
It’s true that mongodb does not store any field as null if it’s not provided (even with mongoose schema which needs to have all of the merged fields defined upfront) but still our collection isn’t meaningful enough and is very hard to scale as more user roles are brought in to the system.
A General And Specialized Collection
Finally, the best way, in my opinion, is one-to-one relationship in split form i.e. use two collections and link them with a reference. In our case, User is the generalized collection containing common fields, while all other user types are specialized collections with only relevant fields placed in them.
In 1-1 relations, the choice of collection to keep reference is arbitrary, as both referred and referenced document are unique and performance-wise it doesn’t make much of a difference to keep the reference on either side. In the modified modeling below, we place
user field in all specialized collections to keep User reference (user id).
So let’s see how our modeling stands at this stage:
Note that we still need to keep
userType because user has no way to know of it’s type otherwise.
firstName lastName dob email password education userType
education batch CGPA enrolDate user
degrees experience bio joinDate user
certificates vocationalTraining experience joinDate user
experience joinDate user
weapon experience user
With such splitting, it becomes super easy to add more roles. The login logic too requires little or no change with each addition!
Most of the times in NoSQL data design and modeling phase we don’t seem to find a relevant case of separating collections for one to one cases. Usually it’s not required too, as trivial cases are well served by embedding the document. But in this post we went through an example, where we are better off splitting the collections — in their specialized and generalized forms — and using reference to link them, for better application management and easy scalability .