Thrudb on EC2: A step-by-step guide
Open source document storage suite Thrudb provides a compelling system for building cheap, scalable document data storage. It lends itself nicely to running on Amazon EC2, backed by Amazon S3, with indexing provided by Lucene.
Four services make up the Thrudb suite: Thrudoc for document storage, Thrucene for indexing, Thruqueue for persistent message queuing, and Throxy for load balancing. Best of all, these services are exposed via Thrift, an open source cross-language communication library, which means Thrudb can be used natively from C++, Java, Python, PHP, and Ruby.
Probably the biggest hurdle for using Thrudb is getting all of the dependencies in order and building it. To get up and running quickly, the folks at AideRSS have made public Amazon EC2 AMIs built on CentOS that include Thrudb all installed and ready to use. If you’re looking to get going quickly, those AMIs are a great place to start.
But the AideRSS folks don’t explain how they built the AMI, so while it’s fantastic for getting started quickly, just using the AMIs won’t get one much closer to actually understanding what all it takes to get Thrudb up and running.
Below are step-by-step instructions to go from zero to running Thrudb for less than a dollar.