> His group took advantage of working on a single machine by simply cramming all...

> His group took advantage of working on a single machine by simply cramming all the data to shared memory where all processes can access it instantaneously.

If you can get all your data into RAM on a single computer, you can have a huge speedup, even over a cluster that has in aggregate more resources.

Frank McSherry has some more about this, though not directly about ML training.

http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...