3 is a magic number
MongoDB (and other NoSQL solutions) are architected from the ground up to scale and be run in a robust cluster. This is worlds-apart from traditional databases like MySQL or Oracle, where replication is an after-thought and sharding has to be performed explicitly at the application level.
10gen is offering free MongoDB web classes right now. There are probably faster ways to come up to speed, but the format (short video, quiz, homework, etc) is perfect for some people. I’ve signed up to make sure I’m not missing anything.
Today I’ll go over the configuration of my MongoDB cluster. As I discovered, provided one reads the documentation, it’s surprisingly easy to get up and running.
MongoDB cluster architecture
If you need redundancy and auto-failover, then Replica Sets are awesome. As you’ve probably gathered from the name, your data is automatically mirrored on every node in your set.
When operating in a Replica Set, a primary node is elected, to which all writes will go. Reads can be performed against any node, although you may need to account for replication lag (the time it takes for secondaries to copy writes from the primary).
In addition to normal set participation, nodes in a Replica Set can be:
- silent: these secondaries will not participate in votes
- hidden: these secondaries can’t be voted for, and won’t handle read-only queries
- arbiters: these don’t mirror data at all (thus no queries), but can still vote
These additional set behaviours can be useful for establishing a majority. An election can only successfully promote a secondary to be the primary if a majority can vote. If there is an issue that prevents communication between one half of your nodes and the other, requiring a majority can prevent the accidental election of multiple primaries.
In a traditional SQL system, every tuple you add to a table makes it slower to read from that table next time. Adding an index can help, but there will always be a point where the performance of the table is too poor due to the large number of tuples you have.
In this situation, you might consider breaking your table into 2: A to M tuples go in one table, and N to Z tuples go in the other. Now you need to modify all your applications so that they know where to retrieve and store certain tuples.
MongoDB handles this automatically for you, so you don’t have to modify your application to support DB shards. Just add another shard, ideally a Replica Set just like your first shard. MongoDB will automatically balance the tuples across your shards, keeping track of which tuples are in which shard. Easy.
MongoDB comes with tools for exporting and importing data, if that’s how you want to perform backups.
There’s also a journal that you can enable. The cool thing about using the journal with AWS EBS is that I can just take a snapshot of the EBS volume and this will capture both the journal (recent changes) and the rest of the stored data. That means I can perform all my backups from the AWS Console, which is sometimes more convenient than bashing commands in via SSH.
With my original EBS configuration (2GB for MongoDB data) I found that I couldn’t enable the journal without also enabling “smallfiles”. By default, the journal needs to pre-allocate files that just won’t fit under 2GB.
MongoDB server components
Database server: mongod
This process performs the queries, stores your JSON documents, etc. It’s the heart every MongoDB installation. If you are just mucking about with a local install, or otherwise don’t need shards, then this is all you need.
Having multiple Database processes running (preferably on different machines) is all you need to configure a Replica Set. The minimum recommended nodes in a Replica Set is 3. If one goes down, you’d still have 2 to form the minimum majority required to vote.
Configuration server: mongod –configsvr
This process is required for sharding. It keeps track of which shards hold which tuples. Query performance will drop significantly if this process goes down, so it’s highly recommended that you have 3. The documentation says to either run with 1 or 3, I’m not quite sure why 2 is bad…
As its name would suggest, your application can speak directly to a Router, and is completely isolated from the cluster arrangement behind the Router. All Routers are aware of the Configuration servers, and therefore how many shards you have and which shards are pertinent for any particular request.
Having identified the appropriate shard(s), the Router will forward all writes to the primary replica, whilst balancing reads across the entire Replica Set.
Attempt A: how hard can this be?
I picked the 64-bit binary (non-static) download at MongoDB.org. The statically-linked version is for people with old Linux installations with weird issues. The official guidance is that you try out the non-static version first.
I started off with just one EC2 instance, and configured one of each process:
- a Configuration server
- a Router process, aware of the Configuration server
- a Database server running as a single shard, and as a single node in a Replica Set
It got a little confusing with permissions, there’s an “admin” collection on the Configuration server that is different to the “admin” collection on the Database.
At this time, there were log messages indicating that I didn’t have enough room…
Resizing an AWS EBS volume
Besides the small issue of making the journal fit on my 2GB data volume, I also found that 2GB is not enough space to store the data when configured in a Replica Set.
Here’s how I expanded my 2GB data volume to 4GB:
- via SSH, stopped the MongoDB processes and unmounted the data volume
- logged in to the AWS Console
- detached the EBS volume from the EC2 instance
- created a snapshot of the volume
- created a new 4GB EBS volume, restoring data from the snapshot
- deleted the old EBS volume
- attached the 4GB EBS volume to the EC2 instance
- logged back into the instance via SSH
- used cfdisk to delete the 2GB partition and create a 4GB partition in its place (only needed because I created a partition, so wiser partitionless installs need not do this)
- used e2fschk to check the file-system (which is still only 2GB)
- used e2fsresize to expand the file-system to fill the partition
- did another e2fschk just to make sure
- mounted the volume
- yay! everything is just like before, only with more space!
Attempt B: and then there were 3
So, after getting a bit confused, I decided to start over, but not before actually reading the getting started guides a little bit more thoroughly.
Also, as the recommended production deployment involved 3 machines, I figured that there was no point putting it off, and that I was start with 3 instances from the very beginning this time. Besides actually understanding everything better this time, I do think starting off with the minimum 3 was a big part of this attempt’s success.
My original EC2 instance was in California (US West 1) A, so I made my others in California B and Oregon (US West 2) A. It’d take unexpected downtime at 3 data warehouses in order to knock out my DB. Pretty neat.
Unfortunately, it’s not a point-and-click affair to clone across AWS regions. Whilst my configuration isn’t especially tricky, it did mean that I ended up doing everything at least twice (once in California, and once in Oregon).
AWS Security Groups
According to the security documentation, all MongoDB processes need to be able to communicate with each other, however (for a sharded cluster like mine) clients only need to be able to communicate with the Routers.
Configuration and Routers
Right off the bat, I ran 1 Configuration process on each EC2 instance. I also started 1 Router process on each EC2 instance, with the Routers being told about all 3 Configuration servers. By default, the Routers bind to port 27017 and the Configuration servers bind to port 27019.
These Configuration servers basically learn of each other only via the Router processes. The recommendation is to address them with domain names so that they can be replaced without having to re-configure the Routers.
Databases and the Replica Set
I ran 1 Database process on each instance, in both sharding and replica modes. At this point, I ignore the shard part, and just focus on getting the Replica Set to work. Shard mode servers bind to port 27018 by default, otherwise the Database process would normally bind to 27017 (clashing with the Router processes).
When you run a process in replica mode, you have to tell it the name of your Replica Set. I ended up just sticking with the boring “replSet0”. I do wish I’d come up with something cooler.
I logged in via SSH to one of my EC2 instances, and started a MongoDB Shell (the interactive command-line client) pointed at localhost:27018 (the Database process). Within the Shell, I initialised the Replica Set and added the other two instances (addressed by their domain names) to the Replica Set.
When I checked the status of the Replica Set (still in the Shell), something looked fishy: the other 2 instances were listed via their domain names, however the current instance was listed as “localhost”.
The problem with “localhost”
I started a MongoDB Shell pointing at localhost:27017 (the Router process). I tried to add my Replica Set as a shard to the currently empty cluster governed by my Configuration servers.
This kept giving an error, and I eventually realised that the other nodes in the Replica Set were trying to contact the first via “localhost”, which meant the Replica Set was not successfully formed yet, although it did seem to be from the perspective of the first Database process.
I needed to remove the erroneous “localhost” entry from the Replica Set. My problem was:
- I need to remove the current primary from the Replica Set
- removing nodes from the Replica Set can only be performed from the primary
Starting a Shell pointing at the primary Database process, I triggered its “stepDown” function, causing the Replica Set to elect a new primary. Then I opened a Shell for the new primary Database process, removed the “localhost” entry and added the original primary back, taking care to address it via its domain name.
I guess the TL;DR version is that you should not start your MongoDB Shell pointing at “localhost” if you are about to configure a Replica Set.
I was able to add the Replica Set to my cluster without any further issues. Yay!
I created administrator accounts in the “admin” database in the Replica Set. Then I authenticated as the administrator and created 2 more databases:
- one for my web application
- one for cookies / server-side sessions
I also created read-only and read-write accounts for both of these databases, just in case.
As I won’t need to be adding new databases for the immediate future, all administration tasks (like creating collections, etc) can be performed by the read-write account for the appropriate database.
Now that I’ve covered the cloud infrastructure, and installing Node.JS and MongoDB, I can start actually covering design and implementation concerns. The next article will touch on GitHub, using Google OAuth 2.0 for Login, and I will also muck around with the MongoDB Monitoring Service.