Optimizing Performance Across All Levels
It is a well-accepted fact that early optimization is the root of all evil. Early optimization requires extra, often unjustifiable, work. Early optimization tends to decrease the maintainability of code.
At the same time, there is a violently vocal crowd decrying “bolt-on” approaches to performance. Performance must be designed in from the start, not added as an afterthought if the project’s running ahead of schedule and someone thinks of it before shipping!
Honestly, no one wants their pet peeve to be an often-neglected afterthought, and all of us benefit from better performance. Still, even if performance were always remembered and always addressed at the end of the development life cycle, that would still be far too little, far too late, for this vocal minority.
Performance, however, is not a simple one-dimensional concept. It is not something that is “there” or “not there”. Like darkness, it is often defined as the absence of something: the absence of bottlenecks, the absence of leaks, the absence of unnecessary work. However, it is also defined by the presence of something: the presence of a performant algorithm, the presence of a solid design, the presence of efficient code.
It is often best to think of performance as having multiple levels, related to where the enhancement may be made, and ranging from the lowest (language issues) to the highest (architectural and hardware issues). This may be a bit of an over-simplification, but it adds a level of complexity to the discussion which so often is neglected.
Some languages are “fast”; others are “slow”. Machine code is fast. Really, really fast. Given appropriate time and knowledge, a proper bit twiddler could render a web page using machine code as fast as that web page could possibly be rendered.
The language “continuum” for speed roughly counters the language continuum for ease of coding, maintainability, and high-level abstractions. In fact, some would say that this is true by definition. Machine code sits firmly at the “fast” end of the spectrum, followed by assembly and its like, then by C, then by OO compiled languages (C++ for example), then by bytecode languages (Java, absent JIT compiling, for example), then on the far end by the legion of plain code scripting languages.
Language-level performance, in theory, can be countered by an ideal compiler and expressive-enough language grammar. However, compilers are never perfect, and languages can never be expressive enough.
In reality, each language has a curve of effort applied versus performance achieved for any problem. Higher-level languages tend to level out at a lower level of performance, but get to that level faster than lower-level languages. Given a time constraint, it is possible for a higher-level language to perform better than a lower-level language; this advantage tends to disappear over time as additional resources are applied to the lower-level language.
If language-level performance were the end of the story, we’d all be coding in machine code.
Language-level changes can often result in a <2x (fractional multiplier) performance improvement when moving to a lower-level language. This can be significant, but the upstream costs in the algorithm and architecture levels often outweigh this particular performance gain.
The next level of performance factors happen at the “code” level. It is here that you find repeated operations (recalculating things we already calculated but chose not to “remember”). It’s items which should be pulled out of loops, and unrolling of loops.
Many compilers can mitigate low-level code-level performance problems. Unrolling loops to avoid loop jumping overhead and automatic inlining to avoid function-call stack rolling/unrolling overhead fall into this category of compiler performance fixes which operate at the code level.
I’d also lump in library and implementation choices here in the code-level performance area. Using arrays versus vectors, using sets versus linked lists, etc, all have fairly low-level effects on the overall performance of the application. These changes aren’t necessarily always as straightforward as loop-unrolling and inlining, in that different situations will cause different choices to perform better. However, their effect on performance is more in line with other code-level changes than with algorithm-level changes.
Code-level performance enhancements can often yield 2x or more improvements, but often at a cost of code clarity.
Above code-level performance sits algorithm-level performance. In this layer we see everything from radically different approaches to problems to minor conceptual adjustments to the basic process. Using a QuickSort rather than a Bubble Sort is an algorithm-level performance enhancement. Using a depth-first tree recursion instead of a breadth-first tree recursion might be an algorithm-level performance enhancement (or going the opposite direction might be the enhancement).
Algorithm-level performance enhancements have been known to have a nuclear effect on performance numbers. The right algorithm completely blows away all previous attempts. A small algorithm-level improvement might have a more complex effect on performance, improving it under some situations while decreasing it under others.
Here’s the “gotcha”: algorithm-level performance improvements are unpredictable, and often very expensive. Great engineers will often take years on a problem to come up with a significant architectural-level improvement. You can buy a lot of hardware (architecture-level performance) for a single engineer’s yearly salary!
The corollary to the above: when an algorithmic improvement is suspected, it should be chased to the ends of the earth to prove it out. Any lower-level impediment to chasing down algorithm-level enhancements (ie, choice of a language which requires a huge amount of overhead for a significant change) should be seen as lost algorithm enhancements. If an enhancement doesn’t prove out, it should be put on the back burner; there is often a minor issue in the algorithm which, once corrected, will cause it to shine with all its attendant glory. If there was a flaw in the original idea, that flaw should be fully identified; the engineer responsible will likely spend much time stewing over that logical flaw, and either work around it or incorporate its correction into a wholly new design.
When performance improves by 2,000% (time-basis measurement), it is almost certainly due to an algorithm-level improvement.
From here on up things start to get messier. Architecture-level performance gains could well be subdivided between things like multi-threading and things like database schema and things like physical machine architecture. For the purpose of this discussion, I’m lumping them all together as “architecture-level”.
Of the various types being lumped in, “Database Optimizations” are the most “gray” in their performance effect. A good database optimization can have the same nuclear effect on performance that a good algorithm optimization will yield: the entire game transforms with a single change.
Most other architecture-level changes are more mundane, yielding 1.5nX improvements for 2nX cost increase. For instance, spend four times as much on hardware and see your performance increase by as much as three times. Multithreading is often “free” performance in that the hardware to support multiple threads has already been paid for. To take advantage of additional hardware processing streams (within a single CPU or spread across the network), major algorithm-level changes are often needed.
Interactions of Levels
In general, changes on each level can have much more significant effects on overall performance than changes on the level below. For instance, you can tweak the code-level performance of your bubble sort for a year and still not end up with the performance improvement you’d get from the algorithm-level change to a quicksort. Likewise, a similar improvement to a bubble sort problem might be to make the overall process massively parallel and run it over a grid of computers.
Changes on each layer often need to be supported by changes on lower layers. Throwing two processors at an algorithm which can not multithread will not yield positive results. Attempting a complex algorithm in machine language can and will lead to unmaintainable code more quickly than a more simple algorithm.
Importantly, though, obfuscating performance enhancements at one level can make it really hard to maintain the code, and thus to implement performance enhancements at a higher level. Coding-level performance optimizations in code can make the code harder to comprehend, keeping fresh eyes away from the algorithm and keeping the old eyes distracted by all the pretty lights. Coding-level performance optimizations can also make the code significantly more complex, making it harder to identify all side effects (intended and de facto required).
What it All Means
When approaching a system for which performance is a problem, or when looking at a new system which has performance needs which appear difficult to maintain, all levels of performance need to be taken into consideration. Higher-level (algorithm, architecture) performance optimizations need to be identified and factored into the design as early as possible, as they have both the most dramatic effect on the underlying language and code level decisions, and as they are the most likely sources to bring poorly-performing code into compliance with requirements.
Of all the levels, code-level improvements are the cheapest to implement. Tools in this level are well-known to any serious engineer (sampling, memory allocation, deadlock analysis, etc). These improvements, however, are also amongst the most likely to obfuscate code, and so should be seen as a “last resort” rather than a matter of course.
Language-level changes are often not possible as they essentially require coding “from scratch”. But when allowed, a forward-looking Engineer should always move to a higher-level language than they feel the problem justifies today. Lower-level languages can and will yield an immediate performance increase, but higher-level languages will pay performance dividends in the form of code-level improvements (trading clarity for the performance of a lower-level language) and algorithm-level improvements for the lifetime of the codebase.
Changes at the algorithm level are hard to come by yet often game-changing, and as such should be regarded as precious.