Our great database migration

morgante · on July 1, 2024

> bundling an 80MB+ SQLite file to our codebase slowed down the entire Github repository and hindered us from considering more robust hosting platforms

This seems like a decent reason to stop committing the database to GitHub, but not a reason to move off SQLite.

If you have a small, read-only workload, SQLite is very hard to beat. You can embed it ~everywhere without any network latency.

I'm not sure why they wouldn't just switch to uploading it to S3. Heck, if you really want a vendor involved that's basically what https://turso.tech/ has productized.

dcmatt · on July 1, 2024

"Overall, this migration proved to be a massive success" but their metrics shows this migration resulted in, on average, slower response times. Wouldn't this suggest the migration was not successful. Postgres can be insanely fast, and given the volume of data this post suggests, it baffles me that the performance is so bad.

hobs · on July 1, 2024

For 80MB of data sqlite should also be insanely fast, 1s doesn't seem like a DB problem.

agent001 · on July 1, 2024

80MB is actually large enough to impact git performance according to Github. Imagine every single developer on the team having repo performance impacts bc of the db and how that compounds over time

https://docs.github.com/en/repositories/working-with-files/m...

zie · on July 1, 2024

And yet you didn't read what you are linking to. Also, the OP was already doing LFS, which fixes the issue Github is talking about.

So that's not the issue, I don't know what the OP's problems were with an 80MB file, but that is not the issue.

TheAnkurTyagi · on July 1, 2024

fair point about the slower response times But Neon's not just any Postgres. They've got that whole serverless angle, which can be a game-changer for scaling and costs. Maybe the team's still figuring out how to optimize for that setup? Plus, with Neon, they probably ditched that whole SQLite-in-git headache.

dcmatt · on July 1, 2024

Game changer for scaling and costs? It’s an 80 MB database that needs to get duplicated by n developers and a test and production instance. They could get away with a $200 beelink under a desk. I see no reason to prematurely scale. I would focus on making things fast and creating a “duplicate database” button so devs can work without interrupting prod

cmnzs · on July 1, 2024

What a bizarre article… performance ended up being worse, how can that be considered a resounding success? Doesn’t seem like it’s a slam dunk case for using neon

TheAnkurTyagi · on July 1, 2024

easy to slam others but I don't see what's bizarre about the article. It highlights a successful customer story where Neon (https://neon.tech) played a significant role in improving their db management with its db branching feature. Yes, caching and pooling helped, but Neon made their testing and development much faster and efficient and it's not easy to write such detailed articles. why not any constructive feedback which can help them?

agent001 · on July 1, 2024

what was bizarre in this article. I found it's a simple customer success story about Neon but I see the improvement came from caching and connection pooling but Neon looks promising because of its database branching feature, which made testing and development faster looks like

Neon(https://github.com/neondatabase/neon) seems to have helped them with testing and development. Being able to iterate and validate changes is a great benefit.

meiraleal · on July 1, 2024

Hey agent001, don't be in such a hurry to prove that this article is about marketing neon database. Your green nickname doesn't help much.

simonw · on July 1, 2024

Lots of comments about the drop in performance. No matter how well you tune network PostgreSQL it's going to have trouble coming close to the performance you can get from a read-only 80MB SQLite file.

They didn't make this change for performance reasons.

mgh95 · on July 1, 2024

From the article:

> 4. Performance

> How can we ensure there are no performance regressions, and even performance improvements? When we launched Shepherd we promised to respond to every submission with an indication in 24 hours or less, which directly relates to the performance of our platform.

I actually run a (small, bootstrapped) startup in a very related space (we operate as an agency and MSB, and have direct tie-ins to a much broader range of insurance and financial products) and this article just feels like going the wrong direction. We host on prem (colo) because we knew the DB would be a latency bottleneck and clone operations need to be fast for compliance purposes.

This just feels like the wrong solution for the problem.

threecheese · on July 1, 2024

I had a similar reaction; one of their constraints was “Reduce server memory”, which tells me their Serverless vendor charges more, which is a constraint they don’t need to have. Using a colo server, memory cost is rarely an important factor.

mgh95 · on July 1, 2024

Pretty much. You can run your business basically saying things like "we will take PITR not less than every seven (7) days" and then just run it on off hours with `pg_dumpall` on each database and the memory/storage costs are basically nil.

This entire blog post reflects a lack of awareness of how much these problems can flat out be ignored with different approaches.

TheAnkurTyagi · on July 1, 2024

IMO, think of it like switching from a bicycle to a car. The bicycle might be faster in certain conditions, but the car offers more features and flexibility for different terrains. Similarly, Neon’s database branching feature is like adding all-terrain capability.

mgh95 · on July 2, 2024

I very much disagree with this. It is possible (and we do this in production) to branch a pg database without requiring these tools using `pg_dump` or `pg_dumpall` which uses the ability to branch a database. These can be integrated with either standard ORM tools or SQL scripts to achieve the same result with far better performance and lower cost. This very much appears to be a knowledge issue.

hobobaggins · on July 1, 2024

If most queries take ~ 1s on a relatively small 80MB dataset, then it sounds to me like they really needed to run EXPLAIN on their most complex queries and then tune their indexes to match.

They could have probably stayed with SQLite, in fact, because most likely it's a serious indexing problem, and then found a better way to distribute the 80MB file rather than committing it to Github. (Although there are worse ideas, esp with LFS)

skeeter2020 · on July 1, 2024

I don't see any mention of the data size or volume of transactions? Also, your API response times were worse after you finished and optimized, and that's a success? or you're comparing historical SQLite vs new PostgreSQL? I kinda see this more as a rewrite than a database migration (which I'm going through now from SQL Server to PostgreSQL)

fbdab103 · on July 1, 2024

The success was all of the resumes that came out with a few more buzzwords.

willsmith72 · on July 1, 2024

> 79.15% of our pricing operations averaged 1 second or less response time

These numbers are thrown out there like they're supposed to be impressive. They must be doing some really complex stuff to justify that. For a web server to have a p79 of 1 second is generally terrible.

> 79.01% to average 2 seconds or less

And after the migration it gets FAR worse.

I get that it's a finance product, but from what they wrote it doesn't seem like a large dataset. How is this the best performance they're getting?

Also a migration where your p79 (p-anything) doubled is a gigantic failure in my books.

I guess latency really mustn't be critical to their product

chrisandchris · on July 1, 2024

> Ensure database is in same region as application server

People tend to forget that using The Cloud (tm) still means that there's copper between a database server and an application server and physics still exist.

willsmith72 · on July 1, 2024

I would narrow that down to people preaching about The Edge.

Storage and compute colocation almost always wins out by far.

In saying that, I love Cloudflare workers for certain stuff

shrubble · on July 1, 2024

If it is a read-only database, I don't fully understand where all the latency is coming from. Is it complex SQL queries?

justinclift · on July 1, 2024

It really sounds like their chosen solution (Neon?) is doing something awfully inefficient when processing the transactions. :(

kwillets · on July 1, 2024

I'm guessing from spreadsheets riddled with per-cell SQL fetches.

ed_elliott_asc · on July 1, 2024

This post is 100% marketing “oh we had so few customers SQLite was great but now we need Postgres” ignore it

agent001 · on July 1, 2024

disagree- It's 100% a happy customer success story and Neon seems to be fit for them.

kwillets · on July 1, 2024

The latency before/after histograms unfortunately use different scales, but it appears that eg the under-200ms bucket is only a few percentage points smaller after the change, maybe 38 before and 33 after.

What I'm curious about is whether Neon can run pg locally on the app server. The company's SaaS model doesn't seem to support that, but it looks technically doable, particularly with a read-only workload.

nikita · on July 1, 2024

(Neon CEO) We could consider that. Right now we are exploring to actually run the app in the same or next to the micro-VM that runs Postgres.

https://neon.tech/docs/introduction/architecture-overview

tarasglek · on July 1, 2024

I think turso in memory replicas might work here. They have some code to capture pg cdc to do that.

Likewise using something like pglite to run a pure pg in memory replica would be very cool

cpursley · on July 1, 2024

If starting off with Elixir and Postgres from the get-go, all this could have been avoided - including the async pains. Said another way: don’t write you backend in JS and just use Postgres.

jhgg · on July 1, 2024

How exactly would writing this in elixir have avoided async pains? Can you elaborate.

apithowaway · on July 1, 2024

Where is the cto or senior technical leader in this? The team seems to be trying hard and keeping the lights on, but honestly there are several red flags here. I’m especially skeptical about the painful and complex manual process that is now 1-click. I want to hope they succeed, but this sounds awfully naive.

banish-m4 · on July 1, 2024

PSA: If you're running a business and some databases store vital customer or financial data, consider EnterpriseDB (EDB). It funds Postgres and can be used almost like Oracle DBMS. And definitely send encrypted differential backups to Tarsnap for really important data.

mattashii · on July 1, 2024

> EDB

... or any of the other companies that provide support and actively contribute to the PostgreSQL project, such as Cybertec, Data Egret, Crunchy Data, AWS, ...

I wouldn't choose any PostgreSQL supplier for their similarity to Oracle DBMS (why would you want to buy into that ecosystem?); as long as it behaves just as vanilla PostgreSQL it is good enough for me.

banish-m4 · on July 10, 2024

The marketing of EDB is as a drop-in replacement for Oracle.

hipadev23 · on July 1, 2024

Shepherd raised $13.5M earlier this year. Imagine being an investor in this company and seeing this post. They seriously wrote a lengthy post publicizing their struggles with an 80MB database and running some queries. The entire technical team at this company needs to be jettisoned.

These are the sort of technical struggles a high school student learning programming encounters. Not a well-funded series A startup. This is absolutely bonkers.

pantsforbirds · on July 1, 2024

I wonder if DuckDB with parquet storage on S3 (or equivalent) would have been a nice drop-in replacement. Plus DuckDB probably would have done quite well in the ETL pipeline.

zitterbewegung · on July 1, 2024

Not to be negative but it seems like many tech posts like this are thinly veiled hiring / recruitment blog posts .

pm2222 · on July 1, 2024

Does the sqlite java lib bundle support for many platforms which jacks up the app size?

lmz · on July 1, 2024

Yes [1], but it sounds like they committed the sqlite file into Github as well?

[1]: https://github.com/xerial/sqlite-jdbc?tab=readme-ov-file#how...

pocketarc · on July 1, 2024

> Furthermore, bundling an 80MB+ SQLite file to our codebase slowed down the entire Github repository and hindered us from considering more robust hosting platforms.

It's... an 80MB database. It couldn't be smaller. There are local apps that have DBs bigger than that. There is no scale issue here.

And... it's committed to GitHub instead of just living somewhere. And they switched to Neon.

To me, this screams "we don't know backend and we refuse to learn".

To their credit, I will say this: They clearly were in a situation like: "we have no backend, we have nowhere to store a DB, but we need to store this data, what do we do?" and someone came up with "store it in git and that way it's deployed and available to the app". That's... clever. Even if terrible.

BHSPitMonkey · on July 1, 2024

> It's... an 80MB database. It couldn't be smaller. There are local apps that have DBs bigger than that. There is no scale issue here.

It depends. If that 80MB binary file in git is updated/replaced often, you likely have a problem (every 100 changes/replacements might grow the repo by as much as 8GB).

__float · on July 1, 2024

They said it was stored with Git LFS, so it's just a pointer in the repo, not all 80 MB.

gfody · on July 1, 2024

80mb is nothing, I think it's a no brainer to keep it in git but they should've commited the .dump instead of the .db

agent001 · on July 1, 2024

Neon actually solved a bunch of problems here. Sure, 80MB might not seem huge, but in a git repo? Also, 80MB is actually large enough to impact git performance according to Github(https://docs.github.com/en/repositories/working-with-files/m...). That can be a pain, especially if you're updating it often. Neon let them ditch the whole 'DB-in-git' approach. No more slowing down the repo or worrying about it ballooning with every update. Plus, it probably made deployment and scaling way smoother. Sometimes the 'clever' solution isn't the best long-term, you know? Kudos to them for recognizing that and making the switch.

candiddevmike · on July 1, 2024

> That's... clever

It's tech debt

mbreese · on July 1, 2024

It’s tech debt, but it’s cheap debt. 80Mb isn’t much to have to worry about. There are many ways to use a DB of that size that will be “good enough”. So long as the queries were returning fast enough and the dev overhead wasn’t too much (LFS was a good choice), I’m not sure I’d have worried much.

There is tech debt that will eat away at you, and then there is this… something that you’ll want to replace someday, but it can wait until you are ready to tackle it properly. And you may not even know what data access patterns you need right away. For a new company with a pretty new way of working with insurance, this seems like a good trade off to me. Plus, it seems clear that they weren’t confident in their backend engineering yet, so this gave them time to figure things out.

zie · on July 1, 2024

It's more complicated and slower but it's still a "success". LOL.

sgt101 · on July 1, 2024

You. Were. Running. An. Insurance. Company. On. SQLite?

What?

What possessed them?

samtho · on July 1, 2024

The database in question was 80mb, which is very small. It’s also fast and backups are as simple as copying the file.

Not advocating for this particular use case, but we’ve all seen more egregious abuses of SQLite.

agent001 · on July 1, 2024

lol according to github 80 MB is not small - https://docs.github.com/en/repositories/working-with-files/m...

Zambyte · on July 1, 2024

Git is designed for text files. An 80 MB text file is huge. An 80 MB binary database is tiny.

sgt101 · on July 1, 2024

I depends what you are putting into it... I think it's completely feasible to get 100000 customer records into 80 MB... and then some

lomase · on July 1, 2024

Git is a tool to diff code. Github host git repositories.

You are using Github to store data, not code.

simonw · on July 1, 2024

Sounds like a perfectly cromulent solution to me. It's an upgrade from Excel!

lucb1e · on July 1, 2024

Check the new workflow diagram. If I read it correctly, they still edit the database as a spreadsheet and convert that into sql. The diagram is near this text:

> we reduced the process down to a 1- click solution. We built a dashboard in our underwriting platform that actuaries can visit and request factor changes. With a single click, an asynchronous job kicks off, automating the steps of Google Sheet extraction > parsing > codegen > SQL script generation > PR creation. The only engineering involvement now required is a PR review.

roblh · on July 1, 2024

This is completely unhinged, right? Am I crazy? All that work just to make absolutely 100% sure their staff never have to try to learn anything new or modify their processes in any way shape or form.

sgt101 · on July 1, 2024

utterly utterly unhinged.

It's basically a design that someone both very naive and suffering late stage Rabies would come up with.

Exoristos · on July 1, 2024

True in many ways, yet Excel is considerably more robust.

simonw · on July 1, 2024

Can you back that up? SQLite has an extremely strong reputation for robustness.

AtlasBarfed · on July 1, 2024

Well snapshotting/backup isn't hard