uops.info has latency for both (Alder Lake) at 1 cycle but throughput (lower is ...

phkahler · 2025-05-30T13:54:55 1748613295

Intel is also supposed to introduce the new APX instructions which include a bunch of instructions that duplicate existing ones but don't set any flags. The only plausible reason to add these is for performance reasons.

john-h-k · 2025-05-30T14:23:28 1748615008

This isn't just due to the actual dependencies of flag instructions at hardware level (although likely be a factor), it also majorly affects code layout. On Arm64 for example, you can make a comparison, do other operations, and then consume the result of that comparison afterwards, which is excellent for the pipeline and OoO engine. However, because most instructions on x86_64 write flags, you can't do this, and so you are forced to cram `jcc`/`setcc` instructions right after the comparison, which is less friendly to compilers and the OoO engine

dzaima · 2025-05-30T22:19:29 1748643569

OoO should actually be the care where that doesn't matter I'd think - the CPU can, well, execute the instructions not in the order they're in the binary; it's in-order implementations are where that matters more.

And with compare & jump being adjacent they can be fused together into one uop, which Intel, AMD, and Apple Silicon all do.

john-h-k · 2025-05-30T15:43:19 1748619799

note: since learnt that B port is just port 11 in all the intel docs, uops.info just hexifies them to keep ports single-char