I know that this was talked a lot and recently Mark Callaghan also gave a session in MySQL user conference 2008 about the real bottlenecks.
Other day I was testing my thread pool stuff with MySQL 5.1.24 + InnoDB plugin 1.0.1 along with other miscellaneous benchmark tests by making them CPU bound by keep the working set completely in memory to gauge the performance of threads overhead; and on 8-core box InnoDB seems to be doing better than 4-core. And then immediately I started few tests with mysqlslap by keeping complete data set in the buffer pool to get the proper timing on locking overhead.
Here is the comparison of performance on 8-core box with innodb_thread_concurrency is set to 32 and 0 for variable threads on 64-bit Redhat Linux 4 . The same box is used as 4-core by limiting the cores.
The thread concurrency check overhead is completely disabled in the engine when it is set to 0. As you can see from the above graph, the things are not good when thread_concurrency is set ( anything grater than 0) and at times it takes 2-3 times as that of 4-core for simple query execution. On the other hand when I disabled the concurrency (=0), then 8-core really seemed to work well when compared to 4-core. I also played a bit with innodb_thread_sleep_delay by setting it to 0 when thread_concurrency=32; and that is not making much difference (and it does not make sense to tweak this when thread_concurrency=0).
This again indicates that, the locking is a major overhead and you can even scale (to some extent) beyond 4 cores by benchmarking the application and tuning the server accordingly without tweaking the source code.