Clean Up MongoDB Of Old Inactive Users And Their Data With Node Js Script

Reduce Expensive Database Space By Pruning Never-To-Be-Used-Again Data

Databases are not cheap, especially when your application is just taking off and you are low on budget. It is then that freeing up retrievable and precious space from the database is a good way to save some bucks for a while.

For that, you have to set up a criteria based on which pruning to happen.

Pruning Criteria

The focal point of majority apps is user, and most of the other data in the database is usually related to it. Therefore, you might want to plan the clean-up around user. Following are mongoose schemas of three collections, including user, of a hypothetical application.

Schemas

User

// user.js
const mongoose = require('mongoose');

const UserSchema = new mongoose.Schema({
  email: { type: String, index: true },
  password: { type: String, index: true },
  deactivatedOn: { type: Date, index: true },
  active: { type: Boolean, index: true }
  //other fields
});
module.exports = mongoose.model("User", UserSchema); 

deactivatedOn and active fields define solely our criteria to delete the user and all of its data.

Assuming you are following a subscription model in your application, and user gets deactivated (active: false) when subscription is not upgraded, it is at this point you should also fill deactivatedOn with current time with Date.now(). We will give the user some grace period (say 20 days) in case they want to return.

Message

// message.js
const mongoose = require('mongoose');

const MessageSchema = new mongoose.Schema({
  user: { type: mongoose.Schema.Types.ObjectId, ref: 'User', index: true }
  //other fields 
});
module.exports = mongoose.model("Message", MessageSchema); 

Project

// project.js
const mongoose = require('mongoose');

const ProjectSchema = new mongoose.Schema({
  user: { type: mongoose.Schema.Types.ObjectId, ref: 'User', index: true }
  //other fields 
});
module.exports = mongoose.model("Project", ProjectSchema); 

The Cleaner Script

If you haven’t already, install mongoose and mongo-date-query (a handy library to form a date interval query for you).

  npm i mongoose mongo-date-query --save

Below is the cleaner script (call it userCleaner.js) that runs every 5 seconds to look for one inactive user who’s deactivated more than 20 days ago (grace period). If found, all of its messages, projects are deleted, and finally the user itself is removed.

const mongoose = require('mongoose');
const mdq = require('mongo-date-query');
const User = require('./user');
const Message = require('.message');
const Project = require('./project');

const checkInterval = 5000;

mongoose.connect('mongodb://127.0.0.1/webapp');

mongoose.connection.on('connected', checkUsers);

function checkUsers() {
  console.log("*** Cleaner Looking For Users ***");

  let userId;
  User.findOne({ active: false, deactivatedOn: mdq.beforeLastDays(20) })
    .exec()
    .then((user) => {
      if (!user) {
        throw "No User Found"; //this message shows in the catch block below when no user is found
      }
      else {
        userId = user._id;
        console.log(`User with pending deletion found (email: ${user.email})`);
        return Message.remove({ user: userId });
      }
    })
    .then(() => {
      console.log("All user messages removed");
      return Project.remove({ user: userId });
    })
    .then(() => {
      console.log("All user projects removed");
      return User.findByIdAndRemove(userId);
    })
    .then(() => {
      console.log(`User removed from database`);
      setTimeout(checkUsers, checkInterval);
    })
    .catch(e => {
      console.log(e);
      setTimeout(checkUsers, checkInterval);
    })
}

That’s about it!

Additionally, you might want to consider compacting (repairing) the fragmented database, to compress it and recover disk space.