Prompting the question: why isn't the whole disk just a SQLite database?

Johnny555 · on June 14, 2017

Because SQLite isn't meant as a general purpose filesystem. It can do some things better (like, apparently, small blob storage), but not everything.

For example SQLlite vacuuming to free up deleted data can be slow, on large SQLite databases, we've found it much faster to rewrite the entire file than to vacuum it.

It also has some scalability limits, it uses locking to limit to a single concurrent writer (short duration locks), which only scales up to a point.

There're a lot of file system characteristics that differ from a database file format, SQLite could probably be more file-system like,but then it would diverge from being a fast and lightweight database format.

Microsoft tried the database as a filesystem once: https://en.wikipedia.org/wiki/WinFS (though this goes beyond just being a container)

klodolph · on June 14, 2017

The parent comment is a bit misleading, or completely wrong. SQLite also has to allocate blocks in the database for anything you store. Some of the structures and techniques that SQLite uses for doing this are very similar to the way a filesystem does it.

Instead, think about it this way. With a filesystem, the database is managed by the kernel. Every time you want to read a file, you might do four system calls: open, fstat, read, close. Or you might do mmap instead of read, but you're still doing four context switches, at least in typical cases. Switching to the kernel and back has cost. Normally this cost is small, but if you make a lot of system calls you'll notice the costs piling up. The kernel also has to check permissions to make sure that you have permission to read the file.

With SQLite, the database is inside your application. When you read a row, there's a chance that the row is already in your application's memory. This means no context switches back and forth between application and kernel.

Additionally, when you read a row, the entire database page is read into memory, which includes other rows too. The kernel won't do anything like that with your application--it won't give you img1.png and img2.png if you just ask for img1.png. Maybe they'll both be in the kernel's page cache, but you still have to open and read the file.

Chris2048 · on June 14, 2017

Maybe it could be, there are a few different VFS for SQlite, and one of them ("one_file") directly writes to memory (for embedded devices):

https://sqlite.org/vfs.html

http://www.sqlite.org/src/doc/trunk/src/test_onefile.c

Maybe this can be turned into a vfs driver (fuse?).

BinaryIdiot · on June 14, 2017

Microsoft tried doing this and there are file systems that have tried or are trying. It just opens up multiple cans of worms but it's a pretty interesting idea.

jstimpfle · on June 14, 2017

You got me wrong. I explained why appending to a CSV is faster than inserting into a "real" database.

jstimpfle · on June 14, 2017

Now I'm confused myself: Parent's context was: Why appending to a CSV is faster than opening a file and appending to that. There was no word about a database.