This is the second post of a series of articles about the stack powering the TextMe service.
Mid-2010 our user base was growing at a rapid pace, and we started to think about our options in term of backend infrastructure. Back then, every services was on the same server. Even if this server was relatively powerful, having everything on the same box was becoming an issue (aka Gigantic Point of Failure) in term of scalability and flexibility.
We decided to migrate the whole infrastructure on Amazon Web Services. AWS is a famous and very versatile stack of cloud services providing most (all?) of the needed components for a modern web service (Storage, VM, queue service, auto scaling, relational and no sql databases …). A lot of these services are accessible through Rest APIs provided by Amazon and implemented by a lot of third-party libraries (such as Boto in Python by example).
We started to migrate some components over AWS to give it a try and to learn the various API provided by Amazon.
The first obvious part of the system we decided to host on AWS was the storage of user profile pictures and message attachments. Initially when a picture was uploaded via TextMe, it was stored directly on the server. The change was simple as we simply started to forward the picture to AWS and stored it in a S3 bucket. On the other way, downloaded pictures are now redirected directly from S3.
Really simple and easy, thanks to the various library available to access the S3 API.
The second thing we moved over to AWS was our Push Notifications Queue. At that point our notification system was powered by a bunch of workers reading messages from a MemcacheQ. AWS provides a service named Simple Queue Service (SQS) that could easily replace MemcacheQ. We just had to change to have our workers read messages into a SQS queue (and write them into SQS from our PHP backend) and we were good to go! Again, pretty simple stuff, like changing a couple of lines of code.
$queue = new Memcache;
$queue->set($queue_name, json_encode($json_struct), $flag, $expiration);
$sqs = new SQS($AWS_AccessKeyID, $AWS_SecretAccessKey);
$response = $sqs->sendMessage($sqs_uri, json_encode($json_struct));
AWS provides a lot of different services, we weren’t ready to use all of them, but we were eager to give a try to the NoSQL database service provided, called SimpleDB, to log additional data mainly for reporting purposes (signup metadata, message delivery records, notification logs). Thanks to the various library available, it was really easy to add such thing to our system.
Let’s get serious
These changes were definitely the tip of the iceberg, because we still had to migrate our web and database servers.
The first part of the heavy work was to add scalability to the HTTP side.
On AWS, you can rely on the following components to achieve scalability (elasticity?):
- Elastic Load Balancer - ELB
- Auto-scaling rules
- EC2 virtual machines
Basically, the ELB scales the platform up or down automatically (starting or shutting down EC2 virtual machines) according to a set of rules based on various metrics such as latency or load.
Eg: When the load on the platform reaches X, a new VM is started to keep the load under X, and when the load drops below Y a VM is stopped.
This mechanism makes easier the scaling of any HTTP(S) based platform allowing scalability and cost control. HTTPS sessions can even be terminated on the ELB to save CPU costs on the EC2 VMs.
RAM is cheap
The last part of our infrastructure we needed to take care of was the MySQL database, storing user accounts and messages. AWS provides a managed SQL server called RDS (Relational Database Service) powered by their custom flavor of MySQL. You can use the service as a regular MySQL server and RDS provides point in time restoration up to server days and creating read-only replicate is just a click away. Sounds too good to be true? Well, so far, we have been pretty pleased by this service.
As always with Cloud services based on Virtual Machines, you have to size your machine/server wisely, because I/O are really expensive in terms of performance, and you have to put enough RAM to achieve the required level performance. We ended up with a m1.xlarge machine with 15GB of RAM.
During this migration, we had to transfer all the data from our (then) production database server to our brand new RDS instance.
We took the opportunity to optimize a little bit our data structures and switch from the MyISAM to the InnoDB storage engine (mainly to avoid table locking)
After 2 hours of downtime, on March 9th, we were ready to flip the switch and let our users enjoy the brand new TextMe backend.
In our next post, we will some of the news things we were working on to improve the TextMe service.