The first iteration of of the ZipZap backend ran off of a single LAMP (P for Python) server, running a single instance of MySQL. This configuration worked fine for the relatively modest role that the database played in the system. However, with the upcoming emphasis on social media, the ZipZap backend had to be redesigned from the ground up as a database cluster. Specifically, we defined the following requirements:
- Capacity should scale essentially linearly just by adding servers. This is the holy grail requirement.
- Parallel writes should be possible to improve write throughput.
- Read replicas should be available to improve read throughput.
- The server for a given resource should be easily identifiable by the resource key.
Our general approach was as follows:
- Consider MySQL Cluster. The promise of auto-sharding is magical. And while Zillow seems to be quite successful with this approach, our experience on a different project was nearly disastrous. While things may have changed over the last year or so, we judged this option to be too immature for a full-blown social media system.
- Turn to the invaluable website highscalability.com. This is a tremendous resource that offers a forum for the database architects from some of the most data-intense companies (e.g. Tumblr) to describe how they approached the problem of scalability.
Our solution defines an array of write nodes, and a two-dimensional array of read nodes. When a resource is uploaded, a write node is randomly assigned. This balances writes over the available servers, and allows for the improved write throughput, fulfilling requirement (2) above. This write node is then replicated n times to an array of read nodes. Reads are randomly assigned to a read node, balancing incoming read requests and fulfilling requirement (3) above. When a particular write node fills up, it is removed from the list of available write nodes, but it remains in the list of available read nodes. When space has to be added, new servers are added for the new write node and the requisite read replications. Once these machines are added to the system configuration, they start receiving write and read requests as described above, thus fulfilling our requirement. There is no coordination overhead save for the configuration files and key storage, thus fulfilling our holy grail requirement (1).
Of course, all of these read and write nodes have to be synchronized through proper key design. That will be the topic of another post.