No, getting necessary higher voltages from 5V USB bus power isn't really a problem.
But it does require designing for, and it does incur expense for the necessary components. And the supply current available from a USB port is finite (100~500 milliAmps) and must power not only the mic/inst preamp(s) and headphone driver amp(s) but also line input/output buffers, AD and DA converters (codecs), onboard DSP/mixer logic, optical I/O, metering and indicators, and a USB interface, each of which can consume substantially more supply current than the amount required for supplying phantom power bias voltage (also possibly used for powering a microphone's onboard drive electronics).
Consequently, apart from being dictated by cost, when confined to using USB bus-power, compromises must often be made in an interface's features and capabilities, such as limiting the number of mic preamps and headphone outs and total number of I/O channels provided, the available preamp gain and headphone driver output power and possibly even the complete absence of onboard DSP/mixing, compared to a mains-powered interface where the available supply power is not so constrained (or even to bus-powered Firewire interfaces where the supply current is typically significantly higher than for USB although still severely constrained compared to mains-powering).
Regarding a bus-powered Scarlett, "how it sounds" might be affected by how capable it is of driving the particular headphones (of whatever particular impedance they might be) plugged into its headphone jack or of driving its main (line) outputs, rather than anything to do with its mic preamps.
Point being: there is a very good reason why the Octa-Capture (or Scarlett 18i20) requires mains power and can't be USB bus-powered like the Scarlett 2i2 or 2i4 can.