To pile onto the Splunk "love" going on here. Splunk is one of those systems that's too "powerful" for small use-cases, but too expensive for the ones it's really designed for.
Anecdote, I once worked with a client that really wanted to get Splunk, but produced so much network traffic that the discounted annual costs were more than the entire budget for the rest of the organization combined. That's staff, the building, equipment, power, water, everything...the estimated Splunk cost was more than that.
They went with a combination of ELK and a small team of dedicated developers writing automation and analytics against Spark and some enterprise SQL database. Still expensive, still cheaper than Splunk.
That's what I was wondering about when it comes to this acquisition. Can Cisco make Splunk even more expensive? I have faith they can, I know for many folks, Splunk tops the leaderboards when it comes to spend.
Splunk bought SignalFX a while ago and they are trying to lean in hard on the observability craze and piggybacking on OpenTelemetry. I wasn't involve heavily in this migrate to Splunk Observability Cloud project about a year ago but it was a shit show and half-baked and ultimately they dumped it in favor of DataDog IIUC (I had since changed jobs but kept in touch with ex-colleagues).
Worked at a medium size enterprise and was trying to get some detailed performance metrics with a legacy tech stack that didn't have a drop-in APM soluion. This was in the age of graphite which was great for aggregating metrics cheap but not getting detail.
Splunk was used by a much larger product (easily 10x our scale) for monitoring events so there was no red tape to start using it.
After launching the detailed instrumentation (1 structured log event per HTTP request with a breakout of database/service activity) I was able to gain all of the insight needed and build a simple user/url lookup dashboard page to help other engineers see what was going on. We went from being mostly blind to almost full visibility in less than two weeks.
The downside was, we increased our billable Splunk usage by 50% since we were capturing so much more data per log event than the other product just consuming standard IIS/Apache logs.
That type of flexibility was totally worth it. Due to some acquisition shenanigans we broke off from that group and wound up on ELK stack which didn't perform quite as well, but was still usable with the same data. In today's day and age we could have just built an OpenTelemtry library.
Comcast would drop all the error logs for all the cable boxes in the country into splunk. I then queried this to figure out the error code count in a given period. It's really the only thing that can handle the volume.
I remember this talk about pricing strategy by one of their employees in a conference many years back (2017) - https://www.heavybit.com/library/video/value-based-pricing-s.... What I took away from that talk was that pricing can be unintuitive, for both the people setting it and buying it.
The only "unintuitive" part was developers saying the product needed to be $250/yr when the product person made it $2,500/yr which ended up being the right choice
Developers being absolutely terrible at pricing is not unintuitive (I'm a developer)
My experience back in Netflix too. Elasticsearch (we didn't use the L or K) plus query engine on S3 with a catalog was more versatile and way cheaper than Splunk. Nowadays we get a slew of performant OLAP storages that can be used for log analysis as well, which further render Splunk unnecessary.
My experience at a big fintech I won't name: we had our own highly engineered in-house metrics system staffed by a big team. Custom pipeline, integrations in multiple languages, high resolution, custom aggregation and rollups. It was nice.
We also had in-house logging, exception tracing, alerting, service discovery, metrics dashboards, etc. It was all actually pretty good. All engineered by xooglers.
Someone (not to name names) got bitten by the "anti-weirdware" bug and started shifting us off of all our custom-built solutions. Every team got hit with major distractions from their roadmaps for each of these changes. None of the headcount dedicated to staffing the internal systems was freed up - they had to run the new integrations.
The decision was made one day to migrate all of our observability stuff over to SignalFx. Observability wasn't our "core competency" and our systems were "weirdware".
We had to rewrite our instrumentation, all of our reporting dashboards, and all of our alerting DSLs changed. They were not replaced 1:1 for every system and metric, so we emerged in a much worse, much less visible situation across the board. Outages happened or went unreported.
Splunk acquired SignalFx and dramatically raised prices. We scrambled to do the migration process yet again, impacting roadmaps and leading to more outages.
Leadership was changed.
There's one thing to be said about NIH, but when you write systems that are already working, inexpensive, and easy to maintain, you shouldn't throw them out because you're worried analytics isn't your "core competency". Yes - it is your core competency, because you're selling uptime to your customers.
Agreed. Costs plummet when you use S3 as the storage medium for these massive log data sets. I think S3 is much faster to query than most people realize. Just have to be smart about how you organize things.
Sampling via just enabling it for some hosts/partitions is one solution (if you're producing 100M entries a day ... probably could just grab 1/100 of those for parsing).
Another solution is pre-processing (serial dupes are not forwarded).
Another solution is heavily reduced logging (ERR or higher only on prod hosts).
They mean doing the processing that Splunk does is expensive so there simply needs to be less data going into the system (via the pre-processing steps I mentioned) above in order to keep costs sane.
With that said Splunk should offer such a pre-processing product (maybe it does?) which would probably increase their moat even though it reduces revenue somewhat in the near term.
Splunk is honestly kind of the mainframe of SIEM. If you need it, you need it and can probably afford it and they know that. Can you do the job with something else for cheaper? Probably, but not as good and not as easy.
You can't really make an informed decision without knowing how much data they were moving. For it to be that expensive, you'd need to be moving a ludicrous amount of data, and you can always parse data down to the required fields before indexing, which saves on licensing costs.
in 20 years of doing SIEM and SIEMlike solutions, I've yet to find an engagement that said 'Oh, yes...our volumes are XX and YY'...mostly it's a /shrug and a less than educated guess.
There's even reluctance to turning things on and _watching_ it for 10 minutes. An activity that would immediately give you a much better idea of volume. Folks just don't like doing it.
Then you get the things were setting up a redundant logsource is just unwise. DNS logging was 2 orders of magnitude greater than everything else a SIEM was doing. And Email was about the same size.
Similar problems with effectively modeling weather or finding the very smallest of things, there isn't enough compute power or even energy in the universe.
Splunk was so expensive we could not use it to monitor our servers used for weather modeling. Seriously. The log files generated were at times too voluminous and you frequently blew thru your bandwidth cap.
Great product, but completely useless utility value with financial considerations for environments with high volume.
I’ve had the same experience in that I love splunk and their tooling is so easy and powerful. But I can’t afford to put data, especially long term data that requires reproducibility for many years.
I’m always happy when I can use some of our sources that are in splunk but get sad that I can’t do that with everything else.
Its cloud pricing is funny because it’s so much more powerful with massive amounts of data, but they charge based on storage. Our on prem instance wasn’t just simpler to price but we could throttle resources to allow for really high volumes of data with relatively slow query and analysis.
This was the sweet spot for the ELK stack really. You could get the main functionality that Splunk had and self manage it (or run out of a Cloud more recently) and scale to whatever you wanted to.
It mostly just works. Back when I was actively using it it was IIRC the most stable part of the stack. Only went down when daily quota was exceeded. When it ran out of disk, nothing broke, it showed a message in the ui. When space was added, it just started going again like nothing happened. This was something like 2018?
Anecdote, I once worked with a client that really wanted to get Splunk, but produced so much network traffic that the discounted annual costs were more than the entire budget for the rest of the organization combined. That's staff, the building, equipment, power, water, everything...the estimated Splunk cost was more than that.
They went with a combination of ELK and a small team of dedicated developers writing automation and analytics against Spark and some enterprise SQL database. Still expensive, still cheaper than Splunk.