NoSQL Data Modeling: 1 to 1, 1 to Many, Many to Many

In contrast to SQL, NoSQL data modelling allows multiple ways to model 1 to 1, 1 to many and many to many relationships. It does not enforce rules or favors a particular design. The only constraint is application requirements.

Coming from SQL background, I struggled modeling Couchbase and MongoDb applications. Though badly put together at the time, Couchbase documents were of great help. In this post I’m sharing my understanding and way of approaching this problem with some examples.

Databases use different ways to group documents. I’ll use MongoDB’s “collection” because it’s familiar.

I’ve addressed How to create a NoSQL data model diagram separately.

1 to 1

Let’s take Client collection and list down all possible ways of one to one relationship with Address i.e. Client has Address or Address belongs to Client:

Primitive Values (Trivial Case)

Primitive values are by default 1 to 1. They make up the basic document. Client has id, firstName, lastName, registered and addresss. (Or id belongs to Client, address belongs to Client and so on)

{
  "id": 1,
  "firstName": "John",
  "lastName": "Smith",
  "registered": true,
  "address": "Broadway Street house no. 12, Brooklyn, New York, US"
}

Pros

Complete information contains within the document and is easily retrievable without looking into other documents

Cons

Replication will happen if a field value is shared by other documents. For instance address: "Broadway Street house no. 12, Brooklyn, New York, US" in multiple clients

Embedded/Sub Document

Preferred when the embedded document is short and unique or mostly unique i.e. each client has unique address except for few who share residence, in which case address replication will happen but this should be a rarity and can be ignored.

{
  "id": 1,
  "firstName": "John",
  "lastName": "Smith",
  "registered": true,
  "address": {
    "houseNum": 12,
    "street": "Broadway Street",
    "state": "New York",
    "city": "Brooklyn",
    "country": "US"
  }
}

Pros

Complete information contains within the document and is easily retrievable without looking into other documents

Cons

Replication when embedded document is exactly the same in other documents

Referenced Collection

At times embedded document, though unique, gets very large. It can still be kept embedded but there’s a choice of making a separate collection and referencing it. In this way Address is not part of Client document but only linked with it via reference field (address here).

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: "address1"

}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

Usually important side keeps the reference which in the case is Client. But we can also link the same collections other way around by placing reference of Client in Address.

{
    id: "client1",
    firstName: "John",
    lastName: "Smith",
    registered: true

}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    client: "client1"
    .
    .
    .
}

Pros

Document size is kept in check by placing some piece of information in a separate document and only keeping its reference

Cons

For full information two documents need fetching instead of one

no sql data modeling, 15 real-world examples

BMC | Kindle

1 to Many

Continuing with Client and Address example, let’s extend it so a client can have one or more addresses i.e. Client has many Addresses or many Addresses belong to Client

Array Of Primitive Values

{
  "id": 1,
  "firstName": "John",
  "lastName": "Smith",
  "registered": true,
  "address": [
    "Broadway Street house no. 12, Brooklyn, New York, US",
    "Harold Street house no. 77, Brooklyn, New York, US"
  ]
}

Pros

Complete information contains within the document

Cons

Replication will happen if a field value is shared by other documents

Array OF Embedded/Sub Documents

{
  "id": 1,
  "firstName": "John",
  "lastName": "Smith",
  "registered": true,
  "address": [
    {
      "houseNum": 12,
      "street": "Broadway Street",
      "state": "New York",
      "city": "Brooklyn",
      "country": "US"
    },
    {
      "houseNum": 77,
      "street": "Harold Street",
      "state": "New York",
      "city": "Brooklyn",
      "country": "US"
    }
  ]
}

Pros

Complete information contains within the document

Cons

Replication when embedded document is exactly the same in other documents
The document size will become unmanageable when embedded array gets too large e.g. Airplane has million of embedded Part documents
Too many updates required to add/update/delete ever increasing sub documents.
Too many simultaneous update operations will result in accessing the same document, which will slow down the operation and at worse make the data inconsistent (if the operations are not handled atomically i.e. the document changes between the find and update gap)

Array Of References

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    addresses: ["address1", "address2"]
}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

Pros

Document size kept in check by only keeping references
We only need to have one document to have all the information for further use. Example: Client document is retrieved. To fetch addresses all we need is the available references instead of searching the whole Address collection to query documents that belong to Client

Cons

Even with only references in array, document size could go out of control if not planned properly. For example Video has million plus Likes (increasing still) and all of their references are kept in Video document as array
Too many updates required in the document holding the references because of constant and ever increasing/updating references.
Although better than embedded documents, it might also face some issues if there are so many simultaneous operations required to update the references, especially when the operations are not performed atomically.

Reference In “Belongs To” Side

{
    id: "client1",
    firstName: "John",
    lastName: "Smith",
    registered: true,
}

{
    id: "address1",
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US",
    client: "client1"
    .
    .
    .
}

{
    id: "address2",
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US",
    client: "client1"
    .
    .
    .
}

Pros

Document on “one” side doesn’t need to know its belongings and therefore doesn’t need frequent updates or large size.

Cons

Since document has no reference of its belongings, the whole collection search is required to fetch them when needed (indexing solves this problem)
If document is removed all its belongings must be removed/updated or they will be stale data of no use e.g. deleted Post with thousands of Comments where Comment has Post id.

Many to Many

Array Of Primitive Values

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: ["Broadway Street house no. 12, Brooklyn, New York, US", "Harold Street house no. 77, Brooklyn, New York, US"]
}

{
    id: 2,
    firstName: "George",
    lastName: "Smith",
    registered: true,
    address: ["Broadway Street house no. 12, Brooklyn, New York, US"]
}

Pros

Possibly two collections saved and with it lots of references and their management

Cons

Guaranteed replication which means multiple addition/update/delete operations on array element change

Array OF Embedded Documents

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: [{
        houseNum: 12,
        street: "Broadway Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"
    },
    {
        houseNum: 77,
        street: "Harold Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"
    }]

}

{
    id: 2,
    firstName: "William",
    lastName: "Smith",
    registered: true,
    address: [{
        houseNum: 12,
        street: "Broadway Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"
    }]

}

Pros

Extra collections and reference management saved

Cons

Multiple addition/update/delete operations on array embedded element change
Document replication

Array Of References

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    addresses: ["address1", "address2"]
}

{
    id: 2,
    firstName: "George",
    lastName: "Smith",
    registered: true,
    addresses: ["address1"]
}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

Cons

Not scalable. When either side in many to many grows too large then this approach is hard to manage. For example, a Post can be liked by potentially millions of Users and a User can like unlimited number of Posts, so keeping array references of liking users in post or liked posts in user is infeasible

Connective/Associative Collection In Between

//Client
{
    id: "client1",
    firstName: "John",
    lastName: "Smith",
    registered: true,
    addresses: ["address1", "address2"]
}

{
    id: "client2",
    firstName: "George",
    lastName: "Smith",
    registered: true,
    addresses: ["address1"]
}

//ClientAddress
{
    id: "clientAddress1"
    client: "client1"
    address: "address1"
}

{
    id: "clientAddress2"
    client: "client1"
    address: "address2"
}

{
    id: "clientAddress1"
    client: "client2"
    address: "address1"
}


//Address
{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .
}

Pros

The only scalable design for unbounded many to many relationship like a User can like unlimited number of Posts and a Post can be liked by unlimited number of Users

Cons

Extra collection and therefore extra get/add/update/delete operations

Struggling with NoSQL data modeling? My book breaks down 15 real-world examples to make it easier. Get your copy:

BMC | Kindle

nosql data-modeling

1 to 1

Primitive Values (Trivial Case)

Embedded/Sub Document

Referenced Collection

1 to Many

Array Of Primitive Values

Array OF Embedded/Sub Documents

Array Of References

Reference In “Belongs To” Side

Many to Many

Array Of Primitive Values

Array OF Embedded Documents

Array Of References

Connective/Associative Collection In Between

See also