We've recently had to move away from redis for persistent data storage at work t...

cookiecaper · on Jan 11, 2017

>instead of re-architecting for a redis-cluster setup, we decided to move the component to a clustered microservice written in go, that sits as a memory-cache & write buffer infront of cassandra for hot, highly mutated data.

Somehow setting up a Redis cluster and doing whatever you have to do to distribute/shard your keys effectively (which afaik is not much) does sound a little more efficient than rewriting a clustered microservice in Go with a Cassandra backend. Redis clustering is actually quite easy.

Forgive me if I seem grumpy. My recent experiences have caused the "We had a minor issue, so we redid everything in a Totally Cool Super-Neato New Stack That Integrates All The Hiring Manager's Favorite Buzzwords!" perspective to become a bit grating.

Redis is one of the few new pieces of infrastructure over the last 10 years that's truly deserving of its position.

jhgg · on Jan 11, 2017

My post above describes the main reason for moving from redis - the fact that data for inactive users doesn't need to be memory perpetually. :P

cookiecaper · on Jan 11, 2017

Cool. I look forward to the post that reveals the unique properties of Cassandra that ended up making it the most practical data store for your use case.

I understand that Cassandra et al exist to solve real problems that someone out there has experienced, and I seek to throw no shade on the great engineers who make these fine products. I am, however, somewhat dubious that these niche products are applicable in the vast majority of cases where they're deployed. I strongly believe, and I think the data would bare out on this, that when it gets down to brass tacks, most people are integrating such specialized tools into generic products to either a) make life at the office more exciting; b) beef up resume points for their next job application cycle; or c) both.

Someone in our company wrote a blog post pretending to justify the move to a niche datastore. He's very proud of it and makes several spurious, nonsensical justifications in it. The truth is that MySQL would've been many times more practical along all axes, except the one this guy cares most about, which involve his personal career ambitions.

This move was partially under the radar so objections couldn't be raised and full backups were not properly arranged. It cost the company a lot of money not only in time and infrastructure, but also in the recovery process that had to be undertaken by real data experts (or nearest we had at the time, at least) when the cluster was destroyed by one of his careless scripts. :)

Second nightmare, currently ongoing: shifting everything to docker/k8s, which, for just one example among a very long laundry list of complaints, only got support for directly addressing app servers behind a load balancer last month, as a beta feature (in k8s nomenclature, that's "Version 1.5 has a beta StatefulSets feature to make Pods in a ReplicaSet uniquely addressable from inside the cluster! Don't forget to make a Headless Service and Persistent Volume." Exhausted yet? Just wait.).

Why are we switching to something that lacks such basic functionality (we're like 3 versions behind, so we can't use it)? If I told you, I'd have to kill you, but it sure makes our resumes pretty.

I'm all for learning, experimentation, and doing things for fun. We are on Hacker News after all. I guess I've just developed a taste for a stable production ethos that, to co-opt a scriptural term, is not "blown about by every wind of [tech fad]". I crave a company that makes its decisions based on a significant and real cost-benefit analysis that shows substantial unique benefits and sufficient maturity to a tech before jumping on the bandwagon. I guess I just want some sanity.

As it stands, people just pretend that these justifications exist by making up some mumbo-jumbo about "dude JavaScript on the backend is like really event-driven, brah!"

meritt · on Jan 10, 2017

I'd be interested in seeing that, also exactly what bottlenecks were you running into? CPU? I/O? Memory?

jhgg · on Jan 10, 2017

Memory & CPU. We knew we'd be running into both eventually - without adding additional redis instances (memory growing way faster than CPU usage in this specific scenario).

When it was initially built, it was basically a bunch of redis lua scripts to handle updating the data - on redis configured in master/slave managed by sentinels.

Given the nature of the data too, only the data for active users would be hot - but users that were inactive would have stuck around in memory needlessly. Our new system keeps only the hot set of users in memory. We also built it to transparently migrate users from redis to cassandra when they were accessed.

morrbo · on Jan 11, 2017