I've spoken to some who say they need real time stream processing off of MQTT backed by a Spark cluster..... To store ~10 messages/second. I've reduced infra costs and improved perf by just turning it into an API with a retry on the client and storing into a database.
After I left, I hear he's rebuilding the Apache Spark data pipeline stuff "so we can scale".
(This problem they were solving was embarrassingly parallel and could have been shared like crazy to make it way cheaper to host)
After I left, I hear he's rebuilding the Apache Spark data pipeline stuff "so we can scale".
(This problem they were solving was embarrassingly parallel and could have been shared like crazy to make it way cheaper to host)