"Clips and pops" sounds more like a buffer underrun than clipping. Maybe a matter of CPU load balancing? Trying to think why there'd be a difference between multiple instances versus one multi-timbral instance (which
should be more efficient) and all I could come up with is that multiple instances are each going to run in their own thread. Maybe the single instance is maxxing out one core.
Some of Omnisphere's patches gobble up
lots of RAM. Go to the System page, where there is a memory-usage meter that shows how much RAM Omnisphere is using versus how much is still available.
Since you have lots of physical memory, try running it in non-streamed mode. This will cause Omnisphere to load each sample set in its entirety prior to playback, avoiding disk I/O overhead during playback. Or leave it in streaming mode but increase the pre-load memory allocation so that more of the file gets loaded before streaming starts. Also set the memory limit to "no limit".