I get far more than 3 t/s for a 70B model on normal non-unified RAM, so that's completely unfeasible performance for a unified memory architecture like Halo.
And while it has unified memory the memory is quite slow. 250GB/s compared to 500+ for M4 Max or 1800 GB/s for a 5090. So it's fast for a CPU, but pretty slow for a GPU.
(That said, there are not a lot of cheap options for running large models locally. They all have significant compromises.)