Very badly, this is the polar opposite design of a GPU.
It does share the latency-hiding-by-parallelism design, but GPUs do that scheduling on a pretty coarse granularity (viz. warp). The barrel processors on this thing round-robin through each instruction.
GPUs are designed for dense compute: lots of predictable data accesses and control flow, high arithmetic intensity FLOPS.
In contrast, this is designed for lots of data-dependent unpredictable accesses at the 4-8B granularity with little to no FLOPS.