Hiya.
I'd appreciate pointers to tutorials on "simple" sharding using MySQL or PostgreSQL.
What I mean by "simple" is:
- Tables are designed so each one has a field that can be used to identify the correct partition to use (in my case, user_id)
- Data/Queries are organized so that the vast majority of manipulation happens on a single partition
- Rebalancing shards is an offline procedure
I'd love to push the details of disks/servers/processes that handle the queries down into the DB layer and not worry about it at the app (in my case, Rails) layer: give me a single entry point that can shared across all of my app servers. We can play DNS games if need be to distribute that load.
Bonus points if a tutorial talks about how to set up replication in this context too.
It seems like many people would be interested in this.
Mysql-proxy (http://forge.mysql.com/wiki/MySQL_Proxy) looks promising, but it's still in an alpha state. You could try writing a proxy script in Lua -- there's already some master /slave stuff done in this area.
Mysql 5.1 supports partitioning, but not really in the way that you and I mean.
The problem is that sharding breaks some fundamental SQL axioms -- for example, doing SELECTs no longer returns all rows (depending on how the table is partitioned).
Almost everyone I know who does this rolls their own layer. As subwindow mentioned, if you're using Rails, you have to futz with establish_connection due to ActiveRecord's assumptions about one DB connection per class. It's possible -- we've done it -- but it requires lots of connections to the DB and it can be messy. If you're on Rails, it might be easier to scale vertically first before going horizontal.
Replication doesn't need to change in this context -- the Mysql documentation should be all you need to get started.