Optimization usually comes after the initial release and after thorough debugging and the trial by fire that is thousands of users banging away at it in unexpected ways. You don't typically optimize the first version heavily, as working right is more important than working efficiently, and optimizations are notorious for destabilizing code.
You really can't start optimizing until your algorithms are in place and rock solid. It's mostly a time-consuming job consisting of a great many tiny improvements, each of which might shave off only a couple microseconds or a few clock cycles, but in aggregate can accumulate into milliseconds. Once in awhile you find a way to make some routine way, way faster just by re-thinking its logic. If it's a low-level function that gets called often, the improvement can be dramatic.
Optimization usually doesn't involve changing algorithms, but rather making existing algorithms more efficient. Little things like using a rotate-right instruction instead of dividing by two, replacing functions with inline macros, using integer math where appropriate, or not checking a variable for validity in a loop once you're sure it'll always be valid. You can only do these things after you've achieved a high degree of confidence in the algorithms and have a good test suite to verify that the optimized version produces exactly the same results as before.
Some optimizations are as easy as updating your compiler. Compilers keep getting smarter about optimizing on the fly. So at least some of the speed-up could just come down to using a new compiler.