Kafka is a closer to a persistent WAL than a message queue. If your work doesn't...

falcor84 · on Aug 8, 2023

For those like me who aren't used to that abbreviation, it's short for Write-ahead Logging [0].

[0] https://en.wikipedia.org/wiki/Write-ahead_logging

benjaminwootton · on Aug 7, 2023

Why? Its quite easy to use Kafka as a messaging queue without even thinking about the write ahead log semantics. It’s there if you need it, but Kafka scales down to being a message broker fairly well in my opinion.

parhamn · on Aug 7, 2023

Because operationalizing Kafka is difficult from a infrastructure (scala/java, zookeper, durable disk management, lots of moving parts), learning and a code perspective (pointer tracking, partition management, delegation, etc) relative to the other pubsub/mq tools.

So if you don't have it operationalized and your use case is simple, it makes most sense to use a simpler tool (rmq/ampq, cloud pubsub, nsq, etc, perhaps even redis)

AtlasBarfed · on Aug 8, 2023

1) scala/java ... is that fundamentally difficult?

2) zookeeper is being eliminated as a dependenct from kafka

3) durable disk management ... I mean, it's data, and it goes on a disk.

Look, do you want a distributed fault-tolerant system that doesn't run on specialized / expensive hardware? Well, sorry, those systems are hard. I get this a lot for Cassandra.

You either have the stones for it as a technical org to run software like that, or you pay SAAS overhead for it. A Go binary is not going to magically solve this.

EVEN IF you go SaaS, you still need monitoring and a host of other aspects (perf testing, metrics, etc) to keep abreast of your overall system.

And what's with pretending that S3 doesn't have ingress/egress charges? Last I checked those were more or less in like with EBS networking charges and inter-region costs, but I haven't looked in like a year.

And if this basically ties you to AWS, then why not just ... pay for AWS managed Kafka from Confluent?

The big fake sell from this is that it magically makes Kafka easy because it ... uses Go and uses S3. From my experience, those and "disk management" aren't the big headaches with Kafka and Cassandra masterless distributed systems. They are maybe 5% of the headaches or less.

parhamn · on Aug 8, 2023

> 1) scala/java ... is that fundamentally difficult?

It's certainly at least more so as you have a highly configurable VM in-between where you're forced to learn java-isms to manage (can't just lean on your unix skills)

> 3) durable disk management ... I mean, it's data, and it goes on a disk.

Most MQ don't store things to disk besides memory flushing to recovery from crash, in most cases the data is cleared as soon as the message is acked/expired.

Look, I'm not saying not to use Kafka, I'm just pointing out the evaluation criteria. There are certainly better options if you just want a MQ, especially if you want to support MQ patterns like fanout.

The reality is if you're doing <20k TPS on a MQ (most are) and don't need replay/persistance, then ./redis-server will suffice and operationally it will be much much easier.

AtlasBarfed · on Aug 8, 2023

But... go is gc as well. Most JVM gripes are about the knobs on GC, but Go is still a fundamentally GC'd language, so you'd have issues with that.

So... Go was the rewrite? Scylla at least rewrote Cassandra in C++ with some nice low-to-hardware improvements. Rust? ok. C++? ok. Avoid the GC pauses and get thread-per-core and userspace networking to bypass syscall boundaries.

And look, this thing is not going to steal the market share of Kafka. Kafka will continue to get supported, patched, and whenever the next API version of AWS comes out (it needs one), will this get updated for that?

Yeah, Kafka is "enterprisey" because ... it's java? Well no, Kafka is scalable, flexibly deployable (there's a reason big companies like the JVM), has a company behind it, is tunable, has support options, can be SaaS'd, has a knowledge database (REEEAAALLLLY important for distributed systems).

All those SQLite/RocksDB projects that slapped a raft protocol on top of them are in the same boat compared to Scylla or Cassandra or Dynamo. Distributed systems are HARD and need a mindshare of really smart experienced people that sustain them over time. Because when Kafka/Cassandra type systems get properly implemented, they are important systems moving / storing / processing a ton of data. I've seen hundred node Cassandra systems, those things aren't supposed to go down, ever. They are million dollar a year (maybe month) systems.

The big administration lifts in them like moving clouds, upgrading a cluster, recovering from region losses or intercontinental network outages are known quantities. Is some Go binary adhoc rewrite going to have all that? Documented with many people that know how to do it?

majormajor · on Aug 7, 2023

If I could get away with a vendor cloud queue I wouldn't move to Kafka for the hell of it, but if I needed higher volume data shipping I've never found the infra as hard it people make it out to be. Unless you're doing insane volumes in single clusters, most of the pieces around it can work OK on default mode for a surprisingly long time.

You can cost footgun yourself like the blog here talks about with cross-AZ stuff (but that doesn't feel like the right level to do that at for me for most cases anyway), and anytime you're doing events or streaming data at all you're gonna run into some really interesting semantic problems compared to traditional services (but also new capacities that are rarely even attempted in that world, like replaying failed messages from hours ago), so it's good to know exactly what you're getting into, but I've spent far less time fighting ZK than Kafka and far less time fighting either than getting the application semantics right.

I imagine a lot of pain comes from "I want events, I know nothing about events, I don't know how to select a tool, now I'm learning both the tool and the semantics of events and queues both on the fly and making painful decisions along the way" which I've seen several places (and helped avoid in some of the later places after learning some hard, not-well-discussed-online lessons). I think the space just lets you do so many more things, so figuring out what's best for YOU is way more difficult the first time you as traditional-backend-online-service-developer start asking questions like "but what if we reprocess the stuff that we otherwise would've just black-hole-500'd during that outage after all" and then have to deal with things like ordering and time in all its glory.

morelisp · on Aug 7, 2023

Besides the operational concerns mentioned in the sibling comment, Kafka is simply not a great queue. You can't work-steal, you can't easily retry out-of-order, you can't size retention based on "is it processed yet", and you may need to manually implement DLQ behavior.

If you already have Kafka for other (more WAL-y, or maybe older log-shippy) reasons it can be an OK queue, especially if you've got a team that can use Kafka as a WAL they can easily work around using most of the downsides of using it as queue. But I wouldn't take it as a first choice.

zbentley · on Aug 8, 2023

Additionally, you can't easily increase/decrease consumer counts such that all consumers quickly get assigned roughly equivalent workloads.

It will be interesting to watch progress on KIP-932 as the Kafka community thinks about adding message queue behavior: https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A...

parhamn · on Aug 7, 2023

Great point. The basic semantics are very different too. In MQs you partition/namespace/channel (whatever you want to call it) based on how data flows in your application (e.g. fanout). In Kafka you're tied more to a persistance model so you end up with fat linear topics and the "filtering"/flow management happen on the consumer's side.