> We standardized on Clickhouse for everything. (With its own set of surprising ...

aforwardslash · on Aug 8, 2023

Not the parent, but I have some ClickHouse experience. ClickHouse is surprisingly easy to deploy and setup, talks both mysql and postgresql wire protocols (so you can query stuff with your existing relational tools), the query language is SQL (including joins with external data sources, such as S3 files, external relational databases and other clickhouse tables), and it is ACID on specific operations. It assumes your dataset is (mostly) append-only, and inserts work well when done in batch. It is also blazingly fast, and very compact when using the MergeTree family of storage engines.

Development is very active, and some features are experimental. One of the common mistakes is to use latest releases for production environments - you will certainly find odd bugs on specific usage scenarios. Stay away from the bleeding edge and you're fine. Clustering (table replication and sharding of queries) is also a sort-of can of worms by itself, and requires good knowledge of your workload and your data structure to understand all the tradeoffs. Thing is, when designing from scratch, you can often design in such a way where you don't need (clustered) table replication or sharding - again, this also has a learning curve, for both devs and devops.

You can easily spin it on a VM or on your laptop, load a dataset and see for yourself how powerful ClickHouse can be. Honestly, just the data compression alone is good enough to save a s**load of money on storage on an enterprise, compared to most solutions. Couple this with tiered storage - your hot data is eg. in ssd, your historical data is stored on s3, and rotation is done automatically, plus automated ingestion from kafka, and you have a data warehousing system at a fraction of the price of many common alternatives.