> ARM and AMD CPUs have about the same L1 instruction cache size. But don't they have a very different instruction density, giving one a noticeable advantage over the other?
Instruction cache sizes are not set by what you want, but by what you can get away with. The fetch path has both a lot of distance to travel and transistors to switch, and is typically a very critical path for clock speed, and while adding more pipeline stages for it helps with clocks, it also directly hurts performance near branches.
So in the end you have to limit the size depending on what clocks you target and how many stages you are willing to give the fetch path.
Instruction cache sizes are not set by what you want, but by what you can get away with. The fetch path has both a lot of distance to travel and transistors to switch, and is typically a very critical path for clock speed, and while adding more pipeline stages for it helps with clocks, it also directly hurts performance near branches.
So in the end you have to limit the size depending on what clocks you target and how many stages you are willing to give the fetch path.