Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

uops.info has latency for both (Alder Lake) at 1 cycle but throughput (lower is better)

* for add is 0.20 (ie 5 per cycle)

* for adc is 0.50 (ie 2 per cycle)

so it does seem correct.

This seems to be a consequence of `add` being available on ports 0, 1, 5, 6, & B, whereas `adc` is only available on ports 0 & 6

So yes as an individual instruction it’s no worse, but even non-dependent instructions will be worse for OoO execution (which is more realistic than viewing it as a single instruction)



Intel is also supposed to introduce the new APX instructions which include a bunch of instructions that duplicate existing ones but don't set any flags. The only plausible reason to add these is for performance reasons.


This isn't just due to the actual dependencies of flag instructions at hardware level (although likely be a factor), it also majorly affects code layout. On Arm64 for example, you can make a comparison, do other operations, and then consume the result of that comparison afterwards, which is excellent for the pipeline and OoO engine. However, because most instructions on x86_64 write flags, you can't do this, and so you are forced to cram `jcc`/`setcc` instructions right after the comparison, which is less friendly to compilers and the OoO engine


OoO should actually be the care where that doesn't matter I'd think - the CPU can, well, execute the instructions not in the order they're in the binary; it's in-order implementations are where that matters more.

And with compare & jump being adjacent they can be fused together into one uop, which Intel, AMD, and Apple Silicon all do.


note: since learnt that B port is just port 11 in all the intel docs, uops.info just hexifies them to keep ports single-char




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: