Kalexo - how project teams connect

Live Help
How we use Amazon Web Services
News
Written by Hannes Marais   
Thursday, 10 September 2009 18:28

Kalexo uses Amazon Web Services (AWS) extensively to run our service. If you don’t know, AWS is a cloud infrastructure provided by Amazon (yes, the online bookstore) that essentially provides compute power and storage in a pay as you go fashion. AWS is really interesting for startups like us who do not want to be bothered with managing an own data center, at least in the early stages while the company is growing.  In fact, AWS is a great enabler for many software startups because of the really low cost to get started.  It use to be that some of your venture funding was gobbled up building your own data center, but nowadays your “funding” comes from Amazon and the really amazing infrastructure service they provide.

What is also interesting is that AWS keep on getting better. You need a lot of tools to run a data-center, and there are all these infra-structure things that everyone needs.  If it is something a lot of people need, you can be pretty sure Amazon will be providing help or help in the future. Sometimes we were wondering whether Kalexo should build this new infrastructure now or should we just wait until Amazon provides the service for us.

Kalexo Teamwork users tend to eat up a lot of online storage.  Not only do we back up all the files on Amazon S3, we also store all the millions of image tiles from the documents you zoom and fly through on S3. To save on bandwidth costs, we have caching machines in front of S3 that act as reverse proxies and provide an extra-level of security. If there is something I can wish for is that Amazon provides flat pricing for bandwidth like other co-location services do.

We also use Amazon EC2 instances for compute power.  Each Kalexo Teamwork user has at least one persistent network connection to a server which acts as a virtual message router.  The routers continually pass messages between workspace members and synchronize project data in real-time.  We also use EC2 instances to process all the documents that people add to their workspaces. A master hands out documents to a bank of slave machines, each of which analyze the document. One step is to convert each document into a set of image tiles, which are streamed to users as they fly through their “document landscape”.  The number of slaves depends on the instantaneous document load – we use Amazon CloudWatch to scale the number of instances up or down to adjust to the load. Amazon Simple Queuing Service (SQS) is used to queue processing jobs to the master.  We did initially use SQS as a communication system between masters and slaves, but eventually created our own connector which gave us more control, especially providing a real-time update of job progress from slaves.

We also have a lot of meta-data about accounts, users, and projects that needs to be highly available. We use Amazon SimpleDB to store all this information.  We also use SimpleDB to store event logs so we can get an idea what is happening on our network in real-time.  SimpleDB is great for this type of thing as long as you don’t expect a quick response – the service is not as fast as a local database. We use memcached to cache a lot of the information centrally.  Another wish would be for AWS to provide a memcached service.

We use Amazon Elastic Block Store (EBS) to store a few MySQL databases. These databases  hold transaction logs for workspace recovery, and billing system information. Finally, we use Amazon Cloudfront to serve documentation and demonstration videos.

So in Kalexo’s case, the question is which AWS service we don’t use? 

We do not use Amazon Flexible Payment System (FPS) instead having opted for another payment processor. We also don’t use Elastic MapReduce but it could be something we eventually adopt for processing documents.

So essentially Kalexo uses nearly all the infra-structure services that Amazon provides.  Our monthly bills are typically quite low, the biggest expense being having many EC2 instances running all the time. AWS offers reduced pricing for these via something they call reserved instances – it can cut your bills by 50% in some cases.

Our biggest overall infra-structure expense are all the machines we are leasing from Serverbeach for web site service, caching servers and so on. We are big fans of this company too, and we have some very high power servers running from several data-centers in the US.

I hope you enjoyed the little tour of how we use Amazon Web Services. If you have questions about our infra-structure, please contact me at hannes AT kalexo.com for details.

 
Feedback Form