NoSql Data Modeling: 1 to 1, 1 to Many, Many to Many

Surprisingly, data modeling for NoSql databases doesn’t get much traction compared with SQL. 1 to 1, 1 to many and many to many; all can be modeled multiple ways. No favoring of a particular choice, no required normalization. The only constraint is application requirements.

Coming from SQL background, I struggled modeling Couchbase and MongoDb applications. Though badly put together at the time, Couchbase documents were of great help. In this post I’m sharing my understanding and way of approaching this problem with some examples.

Databases use different ways to group documents. I’ll use MongoDB’s “collection” because it’s familiar.

1 to 1

Let’s take Client collection and list down all possible ways of one to one relationship with Address i.e. Client has Address or Address belongs to Client:

Primitive Values (Trivial Case)

Primitive values are by default 1 to 1. They make up the basic document. Client has id, firstName, lastName, registered and addresss. (Or id belongs to Client, address belongs to Client and so on)

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: "Broadway Street house no. 12, Brooklyn, New York, US"
}

Pros

  • Complete information contains within the document and is easily retrievable without looking into other documents

Cons

  • Replication will happen if a field value is shared by other documents. For instance address: "Broadway Street house no. 12, Brooklyn, New York, US" in multiple clients

Embedded/Sub Document

Preferred when the embedded document is short and unique or mostly unique i.e. each client has unique address except for few who share residence, in which case address replication will happen but this should be a rarety and can be ignored.

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: {
        houseNum: 12,
        street: "Broadway Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"        
    }

}

Pros

  • Complete information contains within the document and is easily retrievable without looking into other documents

Cons

  • Replication when embedded document is exactly the same in other documents

Referenced Collection

At times embedded document, though unique, gets very large. It can still be kept embedded but there’s a choice of making a separate collection and referencing it. In this way Address is not part of Client document but only linked with it via reference field (address here).

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: "address1"

}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

Usually important side keeps the reference which in the case is Client. But we can also link the same collections other way around by placing reference of Client in Address.

{
    id: "client1",
    firstName: "John",
    lastName: "Smith",
    registered: true

}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"   
    client: "client1"   
    .
    .
    .  
}

Pros

  • Document size is kept in check by placing some piece of information in a separate document and only keeping its reference

Cons

  • For full information two documents need fetching instead of one

1 to Many

Continuing with Client and Address example let’s extend it so a client can have one or more addresses i.e. Client has many Addresses or many Addresses belong to Client

Array Of Primitive Values

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: ["Broadway Street house no. 12, Brooklyn, New York, US", "Harold Street house no. 77, Brooklyn, New York, US"]
}

Pros

  • Complete information contains within the document

Cons

  • Replication will happen if a field value is shared by other documents

Array OF Embedded/Sub Documents

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: [{
        houseNum: 12,
        street: "Broadway Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"        
    },
    {
        houseNum: 77,
        street: "Harold Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"        
    }]

}

Pros

  • Complete information contains within the document

Cons

  • Replication when embedded document is exactly the same in other documents
  • The document size will become unmanagable when embedded array gets too large e.g. Airplane has million of embedded Part documents
  • Too many updates required to add/update/delete ever increasing sub documents.

Array Of References

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    addresses: ["address1", "address2"]
}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

Pros

  • Document size kept in check by only keeping references
  • We only need to have one document to have all the information for further use. Example: Client document is retrieved. To fetch addresses all we need is the available references instead of searching the whole Address collection to query documents that belong to Client

Cons

  • Even with only references in array, document size could go out of control if ill planned. For example Video has million plus Likes (increasing still) and all of their references are kept in Video document as array

  • Too many updates required to add/delete/update ever increasing references.

Reference In “Belongs To” Side

{
    id: "client1",
    firstName: "John",
    lastName: "Smith",
    registered: true,
}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    cient: "client1"
    .
    .
    .        
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US",
    client: "client1"
    .
    .
    .        
}

Pros

  • Document on “one” side doesn’t need to know its belongings and therefore doesn’t need frequent updates or large size.

Cons

  • Since document has no reference of its belongings, the whole collection search is required to fetch them when needed (indexing solves this problem)
  • If document is removed all its belongings must be removed/updated or they will be stale data of no use e.g. deleted Post with thousands of Comments where Comment has Post id.

Many to Many

Array Of Primitive Values

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: ["Broadway Street house no. 12, Brooklyn, New York, US", "Harold Street house no. 77, Brooklyn, New York, US"]
}

{
    id: 2,
    firstName: "George",
    lastName: "Smith",
    registered: true,
    address: ["Broadway Street house no. 12, Brooklyn, New York, US"]
}

Pros

  • Possibly two collections saved and with it lots of references and their management

Cons

  • Guaranteed replication which means multiple addition/update/delete operations on array element change

Array OF Embedded Documents

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: [{
        houseNum: 12,
        street: "Broadway Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"        
    },
    {
        houseNum: 77,
        street: "Harold Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"        
    }]

}

{
    id: 2,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    address: [{
        houseNum: 12,
        street: "Broadway Street",
        state: "New York",
        city: "Brooklyn",
        country: "US"        
    }]

}

Pros

  • Extra collections and reference management saved

Cons

  • Multiple addition/update/delete operations on array embedded element change

Array Of References

{
    id: 1,
    firstName: "John",
    lastName: "Smith",
    registered: true,
    addresses: ["address1", "address2"]
}

{
    id: 2,
    firstName: "George",
    lastName: "Smith",
    registered: true,
    addresses: ["address1"]
}

{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

Cons

  • Not scalable. When either side in many to many grows too large then this approach is hard to manage. For example, a Post can be liked by potentially millions of Users and a User can like unimited number of Posts, so keeping array references of liking users in post or liked posts in user is infeasible

Connective/Associative Collection In Between

//Client
{
    id: "client1",
    firstName: "John",
    lastName: "Smith",
    registered: true,
    addresses: ["address1", "address2"]
}

{
    id: "client2",
    firstName: "George",
    lastName: "Smith",
    registered: true,
    addresses: ["address1"]
}

//ClientAddress
{
    id: "clientAddress1"
    client: "client1"
    address: "address1"
}

{
    id: "clientAddress2"
    client: "client1"
    address: "address2"
}

{
    id: "clientAddress1"
    client: "client2"
    address: "address1"
}


//Address
{
    id: "address1"
    houseNum: 12,
    street: "Broadway Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

{
    id: "address2"
    houseNum: 77,
    street: "Harold Street",
    state: "New York",
    city: "Brooklyn",
    country: "US"
    .
    .
    .        
}

Pros

  • The only scalable design for unbounded many to many relationship like a User can like unlimited number of Posts and a Post can be liked by unlimited number of Users

Cons

  • Extra collection and therefore extra get/add/update/delete operations

Detailed Examples