What can we learn from the Google File System?
I found this via my twitter herd and I've been thinking about it for days. Kirk McKusick interviews Sean Quinlan about the Google File System. Fascinating stuff.
We're very fortunate to have storage scalability challenges ourselves at 'Undisclosed' (formerly OthersOnline). We're amassing mountains of modest chunks of information via a set of many many hundreds of millions of keys. We've evolved thusly:- MySQL table with single key and many values, one row per value.
- MySQL key/value table with one value per key
- memcachedb - memached backed by Berkeley DB
- Nginx + Webdav system
- Our own Sam Tingleff's valkyrie - Consistent Hashing + Tokyo Tyrant
What is X for some random hardware in the cloud? How do we find it? What if X changes as the disk gets full? What kind of mapping function do we send the application key into to return a storage key to get the best amount of sequential disk access on write and best memory caching of chunks on read? What about fragmentation?It does seem as though the newish adage is very true.. web 2.0 is mostly about exposing old-school unix commands and system calls over HTTP. I keep thinking this must have been solved a dozen times before. cycbuf feature of usenet servers?


