I've always been a little leery of benchmarks that load up one plugin over and over until dropouts occur. It seems like a clunky and imprecise method that might not even produce repeatable results, much less translate into real-world benefits.
Even when testing things with far fewer variables, such as disk performance, I can run the same benchmark with the same software on the same machine and get different results on different days.
One would reasonably expect Reaper to perform well, just because it's not been around as long as SONAR and is not burdened by legacy code (can Reaper load a 10-year-old project? Nobody knows...there aren't any 10-year-old Reaper projects.) But I'd be very surprised if the performance gap was a) large and b) across the board for all activities.