Lock-free code is all the rage these days but it’s not just a fad. Having recently quantified the performance impact of a single lock on shared memory it’s easy to understand why eliminating locks (and indeed any other kind of kernel interaction) is the key to high performance.
A logical consequence of this is that threads must share no state (memory, disk or anything else) with any other thread unless it can be done in a safe manner without requiring synchronization. While there are some patterns that can be used for this, in general the solution is the shared nothing (or sharded) architecture where each thread works completely independently.
Coupled with core-locked threads, shared nothing architectures are capable of extracting the last drop of performance out of the underlying hardware. Suddenly that multi-core CPU looks like a very loosely coupled bunch of bare-metal processors.