Ask HN: "Simple" sharding tutorials using MySQL or PostgreSQL?

dzohrob · on June 26, 2008

Most of the methods people use for sharding/partitioning end up in the application layer, because open-source DBs just aren't fancy enough yet to handle this.

Mysql-proxy (http://forge.mysql.com/wiki/MySQL_Proxy) looks promising, but it's still in an alpha state. You could try writing a proxy script in Lua -- there's already some master /slave stuff done in this area.

Mysql 5.1 supports partitioning, but not really in the way that you and I mean.

The problem is that sharding breaks some fundamental SQL axioms -- for example, doing SELECTs no longer returns all rows (depending on how the table is partitioned).

Almost everyone I know who does this rolls their own layer. As subwindow mentioned, if you're using Rails, you have to futz with establish_connection due to ActiveRecord's assumptions about one DB connection per class. It's possible -- we've done it -- but it requires lots of connections to the DB and it can be messy. If you're on Rails, it might be easier to scale vertically first before going horizontal.

Replication doesn't need to change in this context -- the Mysql documentation should be all you need to get started.

subwindow · on June 26, 2008

When I've planned out sharded infrastructures, the database usually wasn't that big of a concern. The web framework or system architecture are usually the pain point. And with Rails, you have to abuse establish_connection if you're going to have each web head read from multiple shards.

The easiest route I've gone when setting up a sharded infrastructure is to use subdomains and a 1-1 Web:DB setup. Have each subdomain go (either thru a reverse proxy or hardware load balancer) to a different (sharded) webhead. Each web head talks to two databases- the common database and its sharded DB. With this you'll probably want a "common" web head to handle home page traffic and authentication (after they are authenticated they'd get dished off to their shard).

Salesforce.com was my inspiration for this method, and it is probably reasonably common. It probably also has a name, but I do not know what it is.

sanj · on June 26, 2008

I've seen folks approach things this way -- Dr. Nic's done some of it in an exploratory manner: http://tinyurl.com/36twmo

But I don't think it is the best approach. I'd much rather push ALL of the magic down into the database layer and not have the app worry about it at all. I want ONE layer of magic and I'd prefer it to be in the DB, where it seems it "should" be.

cstejerean · on June 26, 2008

I'd rather have the layer of magic be in the part of the code I understand best so I can fix it when things go wrong.

sanj · on June 26, 2008

I'm pretty confident I can learn what I need about DB configs, so I'm more concerned about it being in the "right" place from a complexity and efficiency standpoint.

j2d2 · on June 26, 2008

Why doesn't anyone ever talk about distributed caches? Is it really necessary to hit the database for all data? Perhaps I'm confused...

mihasya · on June 26, 2008

I disagree with you on many points there. I'm not sure what you work on, but the bottleneck in my experience is almost always the database.

Doing 1:1 web to db setup is not a good idea because more than likely you'll end up under utilizing the web servers as they will outpace the db (unless your db has one three-column table in it of course). Even if it's the other way around, you're still handcuffing whichever server is faster/under less load. Every time, say, a DB server is overloaded, instead of just adding another DB server, you have to add TWO servers.

Also, this ends up being sort of sloppy b/c the subdomain for your website changes by user once they've logged in. What happens when one user pastes a link to something he saw to the other user? You have to figure out how to handle that.

To the OP: I have recently been looking around for similar things and, as expected, this is a very specific problem that hasn't been tackled (or certainly tackled well) by that many websites. In any case, there is no cookbook solution for it. It is probably a good thing, since your needs in this case will likely be very specific. You are most likely going to have to go it alone.

I have yet to implement this in practice, but my approach would be to have a single DB master that just has the shard allocating table (make sure to cache the shit out of that, hopefully using your app server's local cache, though I don't know if RoR has something equivalent to PHP's APC, so you don't have to do a lookup on every call) followed by the additional masters. Figure out what you're going to shard by (user id etc). Then you will want to write some sort of algorithm that distributes new content between the existing servers (this will depend on how much content there is per-user; if you are sharing something that is database intense and each user has many of it, tying users to a single shard could lead to that shard getting screwed by a few super active users, so you may want to shard on a more finegrained level than that). At first, you will probably just have to assign new shards based on how many shards are on the servers you currently have (as you add an empty server, all new shards go to that), but I would recommend overtime implementing something that uses actual db server load or some other statistic that is actually more telling of how much work each db server is doing to distribute.

As your DB grows (in terms of adding tables), I'd make sure to keep a script around that makes moving a shard easy. Basically something where you can put up a "Your account is under maintenance" for one user and then just run a script that updates the central allocation database and moves all the data associated with that shard accordingly.

I think this goes without saying, but replicate every master both for reads and for failures. You also want memcached in there somewhere, I would assume, but that's another story.

As I said, I have yet to actually start implementing this, but this has several advantages. Database servers are easy to add without manipulating your app (especially if your sharding algorithm is good), you can shard according to load, you aren't handcuffing your app servers to a single database server, etc.

Hope this helps. Cheers.

Edit: you might find some things that will help here: http://en.oreilly.com/mysql2008/public/schedule/proceedings I can't remember which talks they were, but I saw quite a few people show layouts of approximately how their architecture worked. Not all of them were great, but it might give you some ideas. Look through the memcached slides for sure.

subwindow · on June 26, 2008

Who says the sharded webheads and databases have to be on different boxes?

As for the linking problem: I had assumed (wrongly, perhaps) that the application in question is one that is friendly to sharding- an application where little or nothing is shared amongst discrete groups of users (companies, teams, etc) and things are sharded at the group level and not the user level. If the application can't handle that, then sharding is probably not the best database scaling option.

aston · on June 26, 2008

The 2nd edition of High Performance MySQL might interest you: http://www.amazon.com/dp/0596101716

It covers, among tons of other stuff, some new features in MySQL 5.1 that make partitioning easier.

alexstaubo · on June 27, 2008

Sharding is icky and hard because it strokes relational databases against the grain. It also, incidentally, goes against the conventions of most web frameworks, including Rails. Since there's no database that does it for you, you really have to design for it from the beginning. Clay Shirky's scalability book, with anecdotes from Flickr's early development, is a must-read.

The Skype people do transparent sharding with PostgreSQL (the highscalability.com guy blogged about it here: http://highscalability.com/skype-plans-postgresql-scale-1-bi...). They accomplish this using their own PL/Proxy plugin (http://developer.skype.com/SkypeGarage/DbProjects/PlProxy). The only way they can do this is by _never_ doing SQL in the client. Instead, all SQL is wrapped in server-side functions, aka stored procedures (written in PL/Python). The magic in PL/Proxy is to enable the execution of these functions based on a hash. So for example, a function get_user_email(username) is implemented as:

create function get_user_email(username text) returns text as $$ cluster 'userdb'; run on hashtext(username); $$ language plproxy;

...which results in the real function being executed on some other server.

There are pros and cons to this approach. Could they not have implemented this in the application? Yeah. The additional latency of doing one extra remote call (even if it can be pipelined) can't be good.

Another way of automating sharding is to use a middle layer like PgPool, GridSQL or Continuent uni/cluster. The first two are open-source projects, the latter is commercial; PgPool is written in C, the last two are Java (although GridSQL has C bindings).

PgPool is a proxy that can also do replication, partitioning, load-balancing and parallelization -- and you can pretty much pick which ones you want. PgPool intercepts all SQL statements, inspects them, possibly rewrites them, and sends them to one of several backend PostgreSQL servers. For example, it can route all updates to multiple PostgreSQL servers, ensuring that they're identical (until one of them goes offline, at which point you have to pull it manually back into the pool). It can also partition data, by checking inserts and routing them to the right box, and you can write the partitioning functions in PL/pgSQL. And it can also route queries semi-intelligently, so that a query that is known to only touch a single partition will only go to that partition. And it can parallelize them, so that when you do "select * from foo", it'll run the query on all servers and then combine the results.

GridSQL covers the same ground as PgPool, but seems a little more advanced. It has a complete SQL parser that is supposed to be able to do query routing and rewriting more efficiently. From what I can see, they are emphasizing performance and parallel queries above anything else.

I haven't tried uni/cluster, but it's similar in scope and features.

The main problem with these products is that they themselves become a bottleneck. I don't think you can scale them horizontally by just piling on more proxies -- the proxies themselves are gatekeepers. So you're just trading one bottleneck for another.

After implementing a couple of large, popular, bottom-heavy, hard-to-scale Rails apps, I am now of the opinion that well-designed apps should never talk directly to a database; by doing this, you are making the database layer a bottleneck, which is particularly bad with relational databases, which cannot scale very far horizontally.

Instead, you should have the application serve an internal API that can be broken up and multiplied and moved. For example, consider the get_user_email function from the exampe above. Consider the Rails way:

User.find(params[:id]).email

You are already tied to a very specific code path -- find the object, read the email. To shard this thing, you have to override the find method, as well as any update methods, and you will end up a graceless patch on top of ActiveRecord. What if we made the use case -- getting the email address of a user -- explicit, as a real API?

User.get_email(params[:id])

Similarly, you would "sculpt" dedicated methods to handle more complicated queries. Consider:

User.get_latest_comments(params[:id] :in_forum => params[:forum_id])

Now we have isolated ourselves from the database, and we have an API that can actually be sharded. In fact, what we have is an abstraction. Abstractions are useful. Sure, the implementation can use ActiveRecord or whatever, but it's no longer dependent on anything except the input. It's not dependent on database model details like which table holds the email address -- the abstraction separates the concern of the application from the data API.

If we have a single app database and it doesn't scale, we can move all the users over to a different server with its own database and make the API be remote, based on a REST or dRb interface. If this user server becomes too slow, we can just add another user server -- "just add another box" is the core tenet of any scale-out stategy.

So if I was building a new app today, that's what I would do.

toffer · on June 27, 2008

One minor correction to your very helpful post: the scalability book you are referring to is "Building Scalable Web Sites" by Cal Henderson (not Clay Shirky).

http://oreilly.com/catalog/9780596102357/

alexstaubo · on June 27, 2008

Of course. Thanks.

rit · on June 26, 2008

I don't know about Rails (It's DB library is ActiveRecord, if I recall names correctly) but SQLAlchemy which is Python's big ORM, has support in it for sharding and similar partitioning, etc.

When we built sketchcast.com sharding was on the design list and a concern - there were some fantastic articles at http://highscalability.com/ that we used as a basis for some of our design.

lpgauth · on June 26, 2008

Not exactly sure if this can help you but I guess you could implement the logic using dbslayer.

http://github.com/harrisj/activerecord-dbslayer-adapter/tree...

sanj · on June 26, 2008

It seems like I'm not the only one searching.

Here's something from my own archives:

http://www.postgresql.org/docs/8.1/interactive/ddl-partition...

alexstaubo · on June 27, 2008

PostgreSQL table partitioning has nothing to do with shards. They don't let you split a database across multiple machines. You can use them to split a single table across multiple disks (using tablespaces), but that's not the same thing.

prakash · on June 26, 2008

google search sharding site:highscalability.com

http://tinyurl.com/5znhvu

sanj · on June 26, 2008

highscalability.com is a great site that provides conceptual level information about sharding.

But it doesn't provide anything approaching a cookbook or tutorial.

gaius · on June 26, 2008

You probably want this 1986 paper: http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf

sanj · on June 26, 2008

No, I understand how/why I want to do sharding and its implications. I'm looking for something much more tactical and practical.