AWS Elastic Compute Cluster

early adventures / mis-steps with EC2

AWS EC2 is pretty neat. For pure Node.JS hosting, it certainly would have been cheaper to use something like Nodejitsu. However, as I plan to run a MongoDB cluster later, I need to be able to install anything I want. EC2 gives me a Linux or Windows virtual machine that I can do anything I like with. Amazon provides the remote connection credentials and other details and then gets out of the way.

This post covers my experience with setting up my very own EC2 just the way I want it. Note that this isn’t the first time I’ve played around with these, but it is the first time I’ve actually been motivated to try and do something productive with them.

AWS EC2

I provisioned a new Micro-sized (the cheapest and slowest) instance using the latest version of Amazon’s own Linux distribution (“when in Rome”) but there are quite a few other snapshots both free and for sale in the Marketplace.

Elastic Block Store is the virtualised storage technology AWS provides for use with EC2. They behave just like any block device, meaning you can initialise them with basically any traditional file-system format you like. To the OS on EC2, an EBS volume looks just like any old hard drive.

This is probably true for all EC2 sizes, but Micros come with a root volume that can’t be made any smaller than 8GB. That’s a little annoying, as the default configuration of Amazon Linux takes up less than 2GB. It only costs $0.11 per GB in US West 1, but every little bit adds up.

Secure Shell for Chrome

Google published a neat SSH client to the Chrome Web Store. All the secure stuff is managed in Native Client code. You can even point it at signature files for use with Public Key Authentication. You might poo-poo this if you work exclusively in UNIX-derivatives, but Google Secure Shell is way cooler than Putty on Windows.

I added my SSH public key to the ~/.ssh/authorized_keys file (labeled clearly so I can delete it if this computer is computer is ever compromised) on my EC2 instance.

EBS volume for MongoDB data

Most folks will recommend that the directory used for storing a database should be on a separate volume to the OS. That way the DB can fill up without causing the OS to die. Depending on the configuration, there may also be a performance benefit to doing so.

For this reason, I’ve created an extra 2GB EBS volume for MongoDB’s data directory, and attached it to the EC2 instance. I will later discover this isn’t big enough, so if you are following along at home, try to give MongoDB at least 4GB.

EBS volume for project configuration

Last time I mucked about with EC2, I tried compiling Node.JS on it from source. This took nearly an hour on a Micro instance (remember how slow and cheap they are?). For this reason, I thought it would be a good idea to put software and configurations specific to my project on a separate EBS volume.

I thought this would be the most convenient way of duplicating my project’s configuration (including slow-to-install packages) between nodes in my cluster: I’d simply take a snapshot and then restore a cloned volume from it.

Snapshots ended up being region-locked (although you can shuffle data around globally using S3) so having a snapshot didn’t really end up making future deployments easier. It does make backups easier, but it just doesn’t seem worth it in the end.

My currently in-use configuration has an extra 2GB EBS volume just for anything I have to compile from source, plus the repository for my project, configuration files, etc. If I was starting from scratch, I think I would just install all this on the initial 8GB volume instead.

partitions on EBS volumes

Just out of habit, I used cfdisk to create a partition on each EBS volume. Then I initialised these partitions with the ext4 file-system. This is unnecessary, even the default 8GB EBS volume for Amazon Linux comes with the file-system spanning the entire volume without any partition-layer in the middle.

Having the extra layer becomes annoying later, especially if you (like me) discover you need to resize a volume. Instead of being a 2-step process, it becomes a 3-step process with some scary partition stuff. I’ll get into that later.

Elastic Load Balancing

AWS ELB is not free to use, but it is extremely cheap compared to setting up yet another EC2 Micro instance with something like nginx or some other high-performance proxy. It’s also very easy to use, and very reliable.

I created an ELB, and noted its external host name.

Back over in Route53, I created a Record Set that points my domain name at the ELB’s external host name. While here, I also set up a sub-domain name for direct access to the EC2 Micro instance (for convenience). The external host names generated by AWS are quite long, so you’ll probably want a shorter or more meaningful short-cut to any resource you want quick access to.

Back over in EC2, I drop the EC2 Micro instance in the ELB, and I set up a Health Check. ELB will regularly make a request to the instances in its pool, and it will only forward requests from clients to instances that are known to be working. This is pretty neat. When I add multiple instances to the ELB later, client requests will be spread evenly over the operational nodes automatically.

The other neat this with using ELB as a proxy, is that you can forward requests on one port to a different port number on your worker instances. This is handy if you don’t want server software on the worker instances to require root privileges (UNIX systems typically reserve port numbers under 1000 for root, it’s a legacy trust thing).

HTTPS with ELB

I created a self-signed certificate for SSL and uploaded it to the ELB.

Clients may now connect to my ELB using either raw HTTP or encrypted HTTP-over-SSL. However, all traffic between ELB and my worker EC2 instances is conducted over local interfaces in plain-text: the SSL ends at ELB.

I do wonder about how easy it might be for someone else using AWS to snoop on the traffic between an ELB and an EC2 instance. I think this is mitigated completely if using a VPC, but for now this is a risk I’ll have to accept. I’m not storing passwords or accepting passwords with my site, so there’s not much to snoop on in this regard.

Node.JS via NVM

Node.JS is not in the Amazon Linux repository. I previously found a custom repository for Fedora (a similar flavour of Linux) but they had to rename the “node” executable in order to prevent clashes with a legacy package, so it felt yucky.

The best bet was to just take matters into my own hands and compile Node.JS from source. Before doing this, you do need to:

yum install gcc-c++ (check first and install the version you like, I picked 4.7)
set the CC and CXX environment variables if necessary (I did this in /etc/profiles.d/gcc.sh)
yum install openssl-devel

I installed NVM to /opt/nvm (after mucking about with directory permissions and ownership), and created a /etc/profiles.d/nvm.sh file to load it into my shell profile. After a causing the profile to be regenerated (I find it’s easiest to just logout and back in), I installed Node.JS with:

nvm install 0.8.11
nvm alias latest
add “nvm use latest” to /etc/profiles.d/nvm.sh for later convenience

GNU Screen

By the way, as compiling Node.JS can take a while, I was in a bit of a pickle. I didn’t want to have to leave my SSH connection open the whole time, but I still needed a sure fire way to have the compile continue even after I logged out. I’ve had mixed results with the the ampersand-approach.

Anyhow, GNU Screen is already installed by default in Amazon Linux, and it does the job.

What I ended up doing was:

screen
nvm install 0.8.11
Ctrl-A D

This started the compile inside a Screen session, and detached the Screen session.

Later, I came back and restored the Screen session with the following:

screen -ls
screen -r ID

I’ve heard tmux is much cooler than Screen (and Screen hasn’t been maintained for years, anyway), so I’ll give it a look at some point in future.

next post…

In my next post, I’ll cover setting up a MongoDB cluster. Exciting!

Ron - 20 Oct 2012