Hi and congratulations for the article on IEEE. The article makes the point that...

highfrequency · on July 18, 2020

The key is that large corporations/labs achieve scale through many distributed machines. This paper explores optimizations that are particular to a single multi-core machine. These optimizations exploit low-latency shared memory between threads on one machine, and thus cannot be replicated on a distributed cluster.