How To Draw NoSql Data Model Diagram?

NoSql, unlike SQL which has ER and class diagrams, has neither names nor constraints for data modeling diagram(s). The obvious reason is the relax rules of NoSql about relationships, which aim to get a developer started with minimum requirements.

Since data modeling diagram is the blueprint of any application we should always draw one. For tool I prefer diagrams.net (previously draw.io), where you simply need to drag and drop symbols into the canvas and align them. Its “Entity Relation” section on the left menu is most suitable for our modeling.

Example Problem Statement

Let us take a problem statement and draw our data model diagram around it in steps:

There’s a gift shop full of gift items. Multiple cashiers perform duty at different times of day and sell these items to customers. Customers are unregistered in system but their name is inquired and put on receipt.

In other words, we need to build a basic gift inventory and checkout system in light of above information. So, we start by identifying obvious and implied entities.

The Entities

  • Item
  • Product - not mentioned but required to distinguish between say Hugo Boss perfume (product) and 10 bottles of it (items)
  • Cashier
  • Transaction - a more informative word than “sell” or “purchase”

And that’s all we need. Customer is not required because we just need its name and has no other information. Receipt is unnecessary as all data to print on receipt is available in transaction.

Brand could be an entity of its own, if we want to store anything more than its name. If it’s just a name, like ‘Nike’, we can store it as an indexed property of the Product, and group products by it for brand related queries.

Listing Down Possible Queries

During the modeling process, it’s always helpful to plan ahead the read and write operations, to be able to keep them efficient and retrieve meaningful information from our data.

1. Reading Data

Requirements change with time, and NoSql is meant to accommodate that, but you must put in some thought about the initial minimum you can specify at the outset. In our case, it will be:

  • Rank the best performing brands and products within a given timeframe
  • Identify the items that need restocking
  • Transaction breakdown by any period of time: hourly, daily, weekly, monthly, yearly (we might need to draw and compare charts for insights)
  • List down cashiers based on seniority
  • A customer has lost the receipt, but they know what products they purchased and when. By querying the transactions, print them a duplicate receipt
  • Get all products that are less than $50 (a customer may inquire the cashier)

Obviously, this is not an exhaustive list, but we get the basic idea.

2. Writing Data

For this part, we should mostly be concerned with how references are stored given the amount of expanding data and frequency of updates. In another detailed post, I’ve explained ways to model NoSql 1 to 1, 1 to many, and many to many relationships along with their pros and cons. Briefly:

Ever Expanding Data

If number of documents are ever increasing, never put them or their references in another document. For example, number of items will keep on increasing, so we can’t embed or add their ids as reference inside product, as eventually it will become too big and run out of the maximum allocated space per document, not to mention slow to fetch. Instead we should keep the product reference in each item.

Frequent Additions And Updates

If some document(s) require a lot of adds, updates, or deletes, don’t embed them or put their reference at a single place, because it could result in:

  • Slowing down the database operations
  • Possible Inconsistent data, if operations were performed not atomically (i.e. a document was fetched, modified on client side, and then updated. Between your fetch and update, somebody else might have updated the document, and now you have the outdated data that your update call will persist).

For example: You embed comments and like of users inside a post. The post gets thousands of comments and likes in a short time. Many would update their comments too. This means that the post will need to be updated thousands of times in a short interval. But if comments and likes had their own collections, every comment would have been added and updated separately and much efficiently.


Now let’s move to the breakdown of all entities with diagrams, their relation with others, and example documents. Due to its familiarity I’m using MongoDB’s collections as entities and documents as examples.

Product

{
    id: 1,
    name: "Dark Blue Cologne",
    brand: "Hugo Boss",
    price: 22.46
}

Cashier

{
    id: 1,
    name: "John Smith",
    cellNum: "4565456789",
    address: "Broadway Street house no. 12, Brooklyn, New York, US",
    ssn: "011-72-XXXX"
    .
    .
    .
}

Product Has Many Items

//products
{
    id: 1,
    name: "Dark Blue Cologne",
    brand: "Hugo Boss",
    price: 22.46
}

//items
{
    id: 1,
    productId: 1,
    barcode: "0705632441947"
}

{
    id: 2,
    productId: 1,
    barcode: "0705632441948"
}

{
    id: 3,
    productId: 1,
    barcode: "0705632441949"
}

Note that in light of the above reasoning, we haven’t put embedded items or their references in an array inside the product document. We keep productId as a reference in items instead.

Transaction

Transaction is a required associative collection to solve following many to many problem:

  • A cashier can sell unlimited number of product items (Cashier John sells 15 sets of Hugo’s Dark Blue Cologne)
  • A product item can be sold by many cashiers (Hugo’s Dark Blue Cologne is sold by cahiers John, Tony, Malinda, George and 2 others)

salePrice is necessary to know at what price the product item was sold (in case of discount it’ll be less)

It might be tempting to model transaction on real world scenario where a customer buys multiple items in one go, which are shown in receipt as such. However, transaction should represent only one sale. For buying of multiple items, say 10 at a time, 10 transaction documents should be created. You can create another collection to group multiple transactions or simply place some unique identifier field (ex: transactionId) in all of transactions made collectively at one time.

Transaction Belongs To Product, Cashier And Item

This relationship can also be described as: Product has many Transactions, Cashier has many Transactions, Item has many Transactions.

Product has many Transactions” is completely optional and depends on requirements. For instance, when you need to generate yearly reports of sale and therefore need to know products in aggregated data, it’s easier to have product reference in transaction document than getting it through item (which will either require join if available in the database, or multiple queries).

//products
{
    id: 1,
    name: "Dark Blue Cologne",
    brand: "Hugo Boss",
    price: 22.46
}

//items
{
    id: 1,
    productId: 1,
    barcode: "0705632441947"
}

{
    id: 2,
    productId: 1,
    barcode: "0705632441948"
}

{
    id: 3,
    productId: 1,
    barcode: "0705632441949"
}

// cashiers
{
    id: 1,
    name: "John Smith",
    cellNum: "4565456789",
    address: "Broadway Street house no. 12, Brooklyn, New York, US",
    ssn: "011-72-XXXX"
    .
    .
    .
}

{
    id: 2,
    name: "Malinda Johnson",
    cellNum: "4565456689",
    address: "Harold Street house no. 133, Brooklyn, New York, US",
    ssn: "011-73-XXXX"
    .
    .
    .
}

//transactions
{
    id: 1,
    productId: 1,
    itemId: 1,
    customerName: "Daniel Witz",
    cashierId: 1,
    salePrice: 22 //salePrice to provide discount option
}

{
    id: 1,
    productId: 1,
    itemId: 2,
    customerName: "Glenn McDowell",
    cashierId: 1,
    salePrice: 22 
}

{
    id: 1,
    productId: 1,
    itemId: 3,
    customerName: "Robert Williams",
    cashierId: 2,
    salePrice: 22 
}

Finalizing The Diagram

We’ve finished our first design iteration and drawing! It efficiently takes care of the read and right considerations we listed down earlier. This basic diagram can be improved as the application development proceeds and the requirements are modified.

As noted earlier, NoSql data modeling lacks conventional names and design principles similar to that of SQL. For this, it’s always better to include the symbols used in the diagram itself for ease of reading for everyone. Let’s add entity and one-to-many symbols, the only two used.

Example Of Olympic Games

Let’s take another quick example and repeat the above steps to reach a final diagram.

Problem Statement

Create an application for Olympic Games

Entities

  • OlympicEvent - to store data for Olympic event, such as the country it’s taking place in, year etc.
  • Stadium - where the games are played
  • Athlete
  • Official - the match officials
  • Competition - to store the details of actual competition
  • Game - which is played by athletes and of which competition is conducted

Possible Queries

  • Find all archery participants from London Olympics 2012
  • Find countries which qualified for Table Tennis in Tokyo Olympics 2020
  • Get country name with highest number of games in Beijing Olympics 2008
  • For the last 4 Olympic games, get all the gold and silver medalists medalists

Data Model Diagram

Relationships Explained

  1. An OlympicEvent conducts many Games (One To Many)
  2. Many Games are played in a Stadium (One To Many)
  3. A Game is played by many Athletes and an Athlete can play multiple Games (Many To Many)
  4. Many Athletes and Officials participate in multiple Competitions; And a Competition has many Athletes and Officials (Both Many To Many)
  5. A Competition can only be of one Game, but a Game is played by numerous Competition (One To Many)

Note that for many to many relationship between Athlete/Competition and Official/Competition, we simply keep an array of references in Competition document, instead of keeping associate collections in between, because a competition won’t have unlimited number of athletes or officials participating, nor will it be updated that frequently.

See also