I worked on a SaaS project who's customers are companies in an old-fashioned ind...

Tostino · on July 9, 2020

There are good reasons for that. At an old company I was at, over a couple years I was around there were multiple (~5) instances of clients asking for custom reports or data dumps, and they were sent data from other clients (as well as their own) because a tenant id was missed in some joins, or something similar. Yes, sloppy practices all around allowed that to happen. That being said, it would have been much less likely to happen if data segregation were enforced at the database level (using either RLS, or separate database/schema/tables per-client).

mulmen · on July 9, 2020

It's really hard to join customer A's addresses table with customer B's invoices when they are different databases.

There's no reason to join across customers in this situation and it saves you from making that type of mistake.

Colocating this on the same database and using the same users/credentials just makes it easier operationally. It's about isolating the data.

outworlder · on July 9, 2020

Just don't fall into the trap of creating a database schema per customer. Having different databases is better (as TXID is not shared)

chrismorgan · on July 9, 2020

Can you elucidate or provide some reference about this? I’m not familiar with any problems in this space (it’s not clear to me what problems a shared TXID could cause), and schemas seemed neat.

outworlder · on July 9, 2020

That all depends on how many customers you expect to have in a single database, and how many tables each one needs. EDIT: also, transaction volume.

Ok, so let's imagine your application requires 100 database tables.

If you have 1 customer, you have 100 tables. Every transaction performed in this database (including updates) will increase the age(datfrozenxid) by one. You now have 100 tables you need to vacuum.

If you only had one schema, that's no problem at all. Even with the defaults and a high number of writes, the default 3 autovacuum workers will quickly deal with it. Even though they can only vacuum a table at the time.

Now your business is growing. 1 customer becomes 10, 100... 10000?

At 10k customers, you now have 1 million tables (maybe more, as TOAST tables are counted). Every single transaction increases the 'age' of every table by one. It doesn't matter that 99% of all tables have low TXIDs, the overall DB "age" will be on the table with the highest count.

The default 3 autovacuum workers will definitely NOT cut it at this point(even with PG launching anti-wraparound autovaccums). You can augment by "manually" running vacuum, but in that case the overall database age will only decrease once it's done vacuuming all 1 million tables, so start early(before you reach the ~2.1 billion mark)

Alternatively, you could write a script that will sort tables by age and vacuum individually. This may not help as much as it seems, as the distribution will depend on how much work your autovacuum workers have been able to do so far. Not to mention, even getting the list of tables will be slow at that point.

So now you have to drastically increase the number of workers (and other settings, like work_mem) – which may also means higher costs – and you'll still have to watch this like a hawk (or rather, your monitoring system has to). There's no clear indication that the workers are falling behind, you can only get an approximation by trending TXID age.

You can make this work, but it is a pain. Even more so if you haven't prepared for it. For a handful of customers or a small number of tables or a small number of transactions this may not matter. Our production systems (~200 tables, excluding TOAST) started to fall behind after a few thousand customers. We have had at least one outage because we didn't even track this at the time and the database shutdown. 20/20, but nobody remembered to add this to the monitoring system.

Another unrelated problems with multiple schemas is: database migrations are much more complex now, you have to upgrade all schemas. This is both a blessing and a curse. Database dumps also tend to take forever.