NoSql Example Of Separate Collections In One-to-One Relationship; Case Of Specialization And Generalization

When And Why Should Reference Be Used To Link Documents In 1-1 Relation

In my detailed post on NoSql data modeling I listed down ways of modeling NoSql data (using mongodb collections). For one-to-one relation, usually, it’s not apparent why one needs a separate collection instead of embedding everything in single document. In this post I’ll address this and share an example.

University/School Example

Keeping our example real-world but simple, let’s model a university or school. A few obvious entities emerge:

  • Student

  • Professor

  • Receptionist

  • Security Guard

  • Janitor

And here are minimum attributes required for each (not including obvious fields like id, createDate, updateDate etc. )

Student

 
  firstName
  lastName
  dob
  email
  password
  education
  batch
  CGPA
  enrolDate
 

Professor

 
  firstName
  lastName
  dob
  email
  password
  degrees
  experience
  bio
  joinDate
 

Receptionist

 
  firstName
  lastName
  dob
  email
  password
  certificates
  vocationalTraining
  experience
  joinDate
 

Janitor

 
  firstName
  lastName
  dob
  email
  password
  experience
  joinDate
 

Guard

 
  firstName
  lastName
  dob
  email
  password
  weapon
  experience
  joinDate
 

How To Model?

Now we’ve listed down the possible entities, we need to actually model them. Let’s see a few possible ways:

All Entities Have Their Own Models/Collections

This way each of the above listed entities have their own models. It seems to be a good choice at first but reveals a problem on further analysis. When application logic is written around this design, and especially when we have a single point of entry into the system (same login page / API with no parameter to identify role) for all kinds of users, it requires us to search each of the five collections for email/password combination to find out which user has logged in (and perhaps take them to their own dashboard)

All Entities In A Single Collection

Note that many of the fields listed above are common across all entities. That includes email and password, which are credentials for login. This commonality offers us an easy solution to the above problem: We can merge all the current entities into one collection, say Person or User, and keep another field userType (or userRole) that tells the user type.

User

 
  firstName
  lastName
  dob
  email
  password
  education
  batch
  CGPA
  certificates
  degrees
  experience
  joinDate
  enrolDate
  vocationalTraining
  weapon
  bio
  userType
 

It saves us the trouble of identifying user type and logging-in difficulty by merging common fields like firstName, lastName, email, password etc in to one collection. At the same time, however, it makes application management hard. Because most of the other fields are exclusive to specific user types, such as, only student has batch and enrolDate; only professor has degrees; only guard has weapon; and so on. That’s a lot of tracking and management.

It’s true that mongodb does not store any field as null if it’s not provided (even with mongoose schema which needs to have all of the merged fields defined upfront) but still our collection isn’t meaningful enough and is very hard to scale as more user roles are brought in to the system.

A General And Specialized Collection

Finally, the best way, in my opinion, is one-to-one relationship in split form i.e. use two collections and link them with a reference. In our case, User is the generalized collection containing common fields, while all other user types are specialized collections with only relevant fields placed in them.

In 1-1 relations, the choice of collection to keep reference is arbitrary, as both referred and referenced document are unique and performance-wise it doesn’t make much of a difference to keep the reference on either side. In the modified modeling below, we place user field in all specialized collections to keep User reference (user id).

So let’s see how our modeling stands at this stage:

User

Note that we still need to keep userType because user has no way to know of it’s type otherwise.

 
  firstName
  lastName
  dob
  email
  password
  education
  userType
 

Student

 
  education
  batch
  CGPA
  enrolDate
  user
 

(For example: user: '507f191e810c19729de860ea')

Professor

 
  degrees
  experience
  bio
  joinDate
  user
 

Receptionist

 
   certificates
   vocationalTraining
   experience
   joinDate
   user
 

Janitor

 
  experience
  joinDate
  user
 

Guard

 
  weapon
  experience
  user
 

With such splitting, it becomes super easy to add more roles. The login logic too requires little or no change with each addition!

Conclusion

Most of the times in NoSql data design and modeling phase we don’t seem to find a relevant case of separating collections for one to one cases. Usually it’s not required too, as trivial cases are well served by embedding the document. But in this post we went through an example, where we are better off splitting the collections — in their specialized and generalized forms — and using reference to link them, for better application management and easy scalability .