I don't think it's so clear cut. They have to pay to compress it. If the data the customer stores is short lived it may not be worth it to them. They don't know if the customer already compressed it so they might be wasting their CPU. They also have to pay to decompress it on every access. They allow you to slice an arbitrary byte range out of an object which is technically harder to implement on a compressed file. They charge by the GB and are not exactly super cheap so if the customer wants to store a big fat file of easily compressible zeros then whatever, they got their money.
It might make more sense on their "deep archive" product maybe where the customer has to commit to a minimum storage retention and also pay a retrieval charge which scales with the amount of data recovered (hence paying for the CPU to decompress).
Moreover zstd quite unusual '--adapt' parameter enables it to "dynamically adapt compression level to perceived I/O conditions". Works for me (albeit the manpage states that "it can remain stuck at low speed when combined with multiple worker threads").
It reacts to the input buffer state (in bad I/O conditions it starves).
On Linux PSI ( /proc/pressure/io ) probably provides more accurate information (the code already uses /proc/cpuinfo ).
Detail: in the fileio.c module there are lines such as:
if (oldIPos == inBuff.pos) inputBlocked++; /* input buffer is full and can't take any more : input speed is faster than consumption rate /
if ( (inputBlocked > inputPresented / 8) / input is waiting often, because input buffers is full : compression or output too slow */
This impacts a 'speedChange' variable.
Its potential values (an enum) are 'noChange', 'slower', and 'faster'.
They are processed rather simply:
if (speedChange == slower) {
((...))
compressionLevel ++;
if (speedChange == faster) {
((...))
compressionLevel --;
I think it is a clear cut, mostly because I do not think that compression compromises any of those features, all while making the user experience better.
For any storage system like this you usually have a few bottlenecks. IO and Network are the obvious ones, followed by tiering (cache, fast io, slow io, ...) and at the very end CPU.
Now let's say network is your bottleneck. If you can send the data to the client in a compressed for then you get the compression ratio as additional bandwidth. And the user would get the data quicker! So compression to the network is a clear win.
But the common bottleneck is often IO, a high end SSDs with 1M IOPS at 4KB would _theoretically_ serve 4GB/s, a 40GBit link. That's without any redundancy over other overhead.
Again compression to the storage layer would decrease the total amount of IOs, thus making sure a customer gets data quicker.
Ok, let's say both are not the issue. The fastest compression algorithms compete with memcopy. So if you need just one copy of your data you might have been faster by compressing it.
Especially fast compression algorithms (zstd, lz4, snappy, lzo, ...) are worth the CPU cost with virtually no downsides. The problem is finding the right sweet spot that reduces the current bottleneck without creating a CPU bottleneck, but zstd offers the greatest flexibility there, too.
Oh for range requests.... Those large objects are likely split anyway, for easier error recovery (imagine 100MB into a 1GB transfer you notice that the file data was corrupted - not good). Once you work on blocks it's easy to do somewhat efficient range requests again.
What I've personally implemented is trial compression with heuristics (you eagerly compress chunks, and if enough chunks don't compress, stop trying). It does require low level input/output control and per-chunk/block compression.
That said, a surprising number of video sources use sparse file type setups, and I've gotten pretty good compression (up to 60%) using LZ4 with NVR files from some brands.
This is not how such storage systems work. If the whole storage system is for you and you only store compressed video files on it then perhaps.
But we are talking about multi-tenant systems here. You will get better performance and/or better prices if the system finds compressible data for other customers.
All the benefits hold, even for you with non compressible data, if there is an overall benefit.
Hitting a bottleneck less often then this will reduce your tail latencies, too.
It is just that _your files_ won't contribute to these improvements. But you get all the benefits as well.
Having a big CPU (which you need to get many PCIe lanes) and then not using it is waste.
It would be waste to _not_ use the CPU time.
Granted, the CPU powet consumption would drop with enabled power management, but that's only marginal gains as the CPU is likely busy anyway and the total duration of busy time might drop (race to sleep)
I have used this inverted logic to great extends. E.g. when expanding data center capacity I was usually swapping old harware for new hardware once it arrived. It is way better to have the older generations as spares and use the new capacity at the most utilized places.
Video files have already entropy coding applied to them and thus any compression gains with reapplying a generic entropy coding like zstd are unlikely:
Hardware compression IS available in Graviton 2:
"1Tbit/s of compression accelerators
• 2xlarge and larger instances will have a compression device
• DPDK and Linux kernel drivers will be available ahead of GA
• Data compression at up to 15GB/s and decompression at up
to 11GB/"
How much would someone pay for this? I have a half-written zstd core, but I doubt the market for $100-150 FPGA-based compression accelerators is all that large.
Probably not worth it as a FPGA solution or even in general as an add-on card (the overhead of dealing with such extra hardware means that the threshold for "worth it" is very high).
I would expect this to become part of newer generations of CPUs once it becomes popular.
I'd never heard of this! The full name is QuickAssist and it does encryption too. They advertise 100Gb/s symmetric crypto, 70Gb/s compression (or roughly 100x faster than ZStandard on a single CPU). Seems to retail for about $650 for a card.
QAT is actually likely to end up inside new server CPUs from Intel - at least according to the advertising material. Also, it is in their new SmartNICs. At least somebody is using it.
It has been done before, if you offload gzip you turn your PCI bus into a choke point. Most of the time you do better keeping it on the main processor.
I think a lot if datacenter SSDs already come with in-drive hardware compression these days, since it not only increases speed but also longevity. So it would actually save money anyways.
It's not great, but it's also not unheard of. Tape capacities are often quoted at double the actual storage space, with fine print that says "assuming 50% compression". Also, if compression makes IOPs faster or reduces wear on SSDs, people might not complain so much.
SSDs could still do it for speed and to write less pages, making the drive last longer, and simply report the uncompressed space as used. They already do all sorts of tricks on pages such as moving them logically, having more internally than they report to use as pages wear out, and so on.
Given that, it's good sense to compress if at all possible simply to make the drive live longer.
And guess what - I just googled, tons of hits, and this has been done for a long time :)
So it makes sense, is done, and is important for modern SSD behavior.
I'm surprised (and shocked) that letting unencrypted data hit the disk is still common enough to make such optimizations worth it.
Even if you just stick the key in the server's TPM without any sealing, an encrypted disk makes it much easier to deal with e.g. drive returns (for warranty or fault analysis) or disposal.
I explained why I think it does provide a benefit even in a datacenter:
"Even if you just stick the key in the server's TPM without any sealing, an encrypted disk makes it much easier to deal with e.g. drive returns (for warranty or fault analysis) or disposal."
Would you resell usable drives that you no longer want to use (e.g. because they're too small or your needs changed more towards SSDs) if unencrypted data was written to them at some point?
How much effort would you put into making sure that no broken drive that can't be wiped leaves the datacenter without shredding?
Any process you put in place will have gaps - e.g. through human error or malicious acts - and this provides pretty solid protection against that.
Unfortunately that's going to really depend. For example, if your threat model is "hard drive gets stolen" there's no point. If your threat model is "attacker can access my database" encrypting the data at the DB level does make sense. But it obviously breaks compression.
And unfortunately compression and encryption are seemingly at odds fundamentally :c
Yes they do. Intel writes this for example (not all drives use it, but some definitely do):
>Data compression via encoding algorithms enables a solid state drive (SSD) to write less data, which in turn yields higher write bandwidth. With a significant amount of data being compressible, performance benefits can be substantial.
What do you mean? Even without compression, you have half the problem. Obviously, at scale, it’s all just statistics and planning on trends instead of individual files.
>. They charge by the GB and are not exactly super cheap so if the customer wants to store a big fat file of easily compressible zeros then whatever, they got their money.
Maybe they charge at uncompressed rate but store it at compressed? Then they got even more money!
This was true a few years back, nowadays it’s cheaper and faster to compress the data at rest as the bottleneck is frequently IO and storage space. Both, in terms of capacity and cost.
Only temporarily with SSDs. With spinning rust, it also often paid off to compress data. We'd store large treebanks compressed, because decompression was much faster than disk reads.
Presumably it's the tradeoff of CPU overhead versus disk and bandwidth (larger files take longer to copy into memory, which is also energy usage. And more bandwidth to shunt around Amazon's own network)
It doesn't really have to impact performance. The index is generated easily as a side-effect of compression. And the index is only needed if you need to seek.
But if you do need to seek, which is really common in data warehouse workloads for example, unless you keep the index in ram you have to do an extra IO on every seek to read the index
there's always going to be some metadata for the file that needs to be looked up before you can start seeking (ACLs, sector/extent/cluster location, etc)
yeah that isn't free either, it adds significant bloat to your metadata. with most enterprise customers encrypting and/or compressing data before putting it into s3, it doesn't seem like there would be much benefit. s3 really isn't the right layer to implement compression. filesystems aren't either. it's better to leave it up to the application.
> yeah that isn't free either, it adds significant bloat to your metadata
yeah, 4 bytes for every megabyte
> s3 really isn't the right layer to implement compression. filesystems aren't either. it's better to leave it up to the application.
yeah, I'm sure you're right and Amazon have absolutely no idea what they're doing and like to spend unnecessary CPU cycles doing pointless work and add "significant bloat" to their metadata
... or, you're wrong (like in every previous comment in this chain)
this tweet is not talking about compressing customer data in s3, i seriously doubt that aws compresses customer data in s3 for all the reasons i've already listed. i am right and amazon does know what they're doing, which is why they don't compress customer data in s3.
4 bytes per megabyte becomes significant at scale when you have to keep it in ram, which you have to do if you want to avoid the extra IO.
You only need a single part to calculate a specific offset, assuming you have part sizes stored in metadata already (a good idea).
Each part can be max 5GiB as per S3 spec. 5120 * 4 = 20KiB.
Even if you unpack to 8*2 bytes in memory when decoding, you are still not talking a huge amount of memory.
The on-disk space is ~0.0004% as blibble calculated, and should easily be offset by the compression achieved. In MinIO we don't store indexes for files < 8MiB, so for small files there is no overhead.
If the added metadata is a problem for whatever system you are looking at, then that is a characteristic of that system and not a general problem.
if it's not in ram, you have to do an extra IO to look it up. i don't think you understand how precious metadata space is in a large scale storage system. if you pollute the metadata cache with useless junk like this, you can't cache as many things, your hit rate goes down, and you have to do more IO operations to service each request on average. name one popular distributed file system or object store that compresses everything by default like you are claiming. you won't be able to, because none of them do it, because it's better to leave it to the application.
> They charge by the GB and are not exactly super cheap so if the customer wants to store a big fat file of easily compressible zeros then whatever, they got their money.
That's not a good argument: they could lower their costs with compression, still charge the same, and make more profit.
Why do they have to either compress it all or not. They must be smart like, have the files split in pieces (just like some network file systems/backups do) and if those blocks are untouched for a while, compress what's compressible and leave them as is during active uses.
I think the point of the different storage tiers of AWS S3 is to get customers to classify their own data, then AWS can pick the right mix of hardware, software, and compute that satisfies AWS’s requirements for availability and COGS.
If the difference between standard S3 and S3 Glacier was just slower disk, then rate limiting the customer would suffice.
But if there’s a significant amount of compute thrown at data de-duplication, compression, and indexing, then it starts to clarify why there’s a pricing penalty for using Glacier with the same access patterns as one would use on standard storage.
> They charge by the GB and are not exactly super cheap so if the customer wants to store a big fat file of easily compressible zeros then whatever, they got their money.
And what if they can charge you the uncompressed size and only actually store much smaller compressed files behind the scene. That seems like having your cake and eat it too.
I think an FPGA can probably compress quite a bit faster than a general purpose CPU, and an ASIC on a card which is network on one side and storage on the other could compress at line speed easily.
If this true, there is possibly a side channel one could run against object storage to determine if someone else in the content-addressable-store has the same files.
Like when it was easy to file share on dropbox by having the correct hashes. A GUID could summon a 1GB file.
A couple of comments across the thread have made similar points, but if I were implementing this, the "client metadata" like the incoming sha256 etc would be implemented a layer higher than the actual byte storage, so the byte storage could be compressed without any impact on that sort of thing.
It's not quite that simple. If you have customer data that's already encrypted, then compression won't do much because it looks random. But of course by the time you get to your infrastructure layers, that'll be the case (or you really messed up your security story!). Which means you'd have to compress right at the edge. They might be doing that (which would basically mean it's the customer compressing it before they encrypt it with their keys because AWS has no business seeing the clear text), but then you get to compress each item separately, which might not be very effective for small values.
tl;dr:There's a real efficiency/security/insider risk trade-off here.
Edit: I should disclose that I work for a competitor. Don't intend any astroturfing.
I agree with you, there could be scenarios where customers supply their own keys and compress the data on their own. My original statement is still true though, the data at rest ends up being compressed and encrypted.
That said, of course, customers can upload encrypted blobs of uncompressed data. But I’d call it an exception that proves the rule. Here service simplicity should win and those blobs may end up recompressed.
How would that work for something like S3 range requests? Rather than reading an entire object sequentially (which would work fine with transparent compression) you can also ask to read an arbitrary byte range (give me bytes 1,000,000,000-1,000,001,000 from the original file). I guess you could maybe store the compressed file in chunks with metadata about the original byte range inside each chunk.
For MinIO (an S3 compatible server), we add an index for each part, which contains uncompressed -> compressed offset pairs.
Since we already used a Snappy-derived method, each 1MB block is stored without backreferences. With this we only have to decode at most 1MB-1 extra bytes to respond with a specific range offset.
Generally with filesystem-level compression you don't compress an entire multi-GB file: you compress segments of maybe a few 100k. This gives you a very slightly worse compression ratio but allows random seeks to still be efficient.
I'm sure it's encrypted, but I doubt that they compress everything. Images and video tend not too compress well since they've typically already been aggressively compressed with specialized algorithms. It would just be throwing CPU cycles away.