This blog is now hosted at consciou.us

Sunday, June 21, 2009

Internet scale databases

After going to the Amazon Web Services event at Mariner Stadium, I've been doing quite a bit of ruminating on building highly scalable web sites.

I suppose that part of this is my roots in webmastering (back when that was a singular title!).

So, I've decided to do some analysis of various technologies that contribute to building highly scalable web sites. Today's analysis is on databases (and whether you should even use one!).

The default: use a relational database.

There are a whole raft of issues with this, not the least of which is write performance. Eventually, a single database cannot be scaled (no matter how much money you throw at it) to support the write volumes required by high volume web applications. There are a few solutions:

Database sharding:

This can take many forms, but there are several basic versions:

Table partitioning: moving tables to separate databases
Range partitioning: e.g. putting user number 1-100,000 on server a, 100,001-200,000 on server b, etc.
Directory partitioning: put a directory in front of the database, so you have to ask two questions (where is this user info stored, and ask for the data from that server) to retrieve data.

More in-depth analysis here.

Of these, I think that the last one is quite interesting, especially if architected in from the beginning. It would be hard to retrofit, but the biggest advantage is that repartitioning is quite feasible.

Caching:

Memcached has become de rigeur in building high scalability applications like web sites. Its ability to speed up websites has been well documented, and I think it is a fantastic solution for not only improving read performance on databases, but the ability to cache the results of expensive operations.

I do have one nit, however, and that is what happens when a memcached server fails: there is no redundancy (which is acceptable, and can be planned for), and the decision about where data should go is dependent on the number of machine instances that are running. It would be better for this to be able to scale up and down easily.

Google's Bigtable, Hypertable and Amazon SimpleDB:

The only one of these that is available as open source appears to be Hypertable. The idea here is that you forgo the whole relational model for a simple data model (think giant spreadsheet), with much more flexibility in what you can stuff into a column, e.g. columns can store multiple attributes. Underneath this simple exterior is a highly distributed datastore (designed to run on many machines) that is designed to handle huge throughput.

And now for the crazy idea:

If, e.g., you only needed to store customer records, and each of those records were identified by a natural key (email address), then you might not need a database at all (or, at least for this purpose).

What if you combined Google's Protocol Buffers with something like Danga Interactive's MogileFS. Every object is stored as a protocol buffer for fast serialization/deserialization, and MogileFS takes care of storing the serialized file. It handles replication, managing redundancy, etc. This could also work on Amazon's Simple Storage Service (S3).

I realize that this does not handle things like iteration, but what if that isn't in the use case? Alternately, what if the architecture is something like Memcached -> Protobuf + MogileFS -> Sharded DB? You could have the best of all worlds: fast access through Memcached, if that fails, retrieve the record from MogileFS (and push into Memcached), and if you need iteration, or an alternate lookup, you still have the relational DB in a scalable format.
Read more...