Stability in Insertion Sort
Join the DZone community and get the full member experience.
Join For FreeThere are two types of sort algorithm: those that are stable and those that are not. Stable sorts maintain the order of items that are deemed equal, whereas unstable sorts make no such guarantees.
So if we have a small set of unordered cards, and we’re sorting on pip value, ignoring suits, the following unordered list:
3♠ 2♣ 3♦ 2♥ 3♣
would be stable sorted as:
2♣ 2♥ 3♠ 3♦ 3♣
since the 2 of clubs appeared before the 2 of hearts in the original list, and the 3s are maintained in the original order too (spades, diamonds, clubs). An unstable sort (of which the most famous sort, quicksort, is an example) wouldn’t guarantee anything about the order of the 2s or the 3s, just that the 2s appear before the 3s.
A good stable sort is insertion sort. This is how you sort cards for, say, bridge. You start at the left hand side, sort the first two cards, and then work your way through the cards one by one to the right, inserting the next card in the proper sequence in the already sorted cards. Here’s how an insertion sort would work in sequence on our original shuffled cards:
3♠ | 2♣ 3♦ 2♥ 3♣ 2♣ 3♠ | 3♦ 2♥ 3♣ 2♣ 3♠ 3♦ | 2♥ 3♣ 2♣ 2♥ 3♠ 3♦ | 3♣ 2♣ 2♥ 3♠ 3♦ 3♣
I’ve indicated by a vertical bar the separation between the sorted part and the unsorted part.
For grins, here’s insertion sort on an array implemented in JavaScript:
var insertionSort = function (a) { var i, j, temp; for (i = 1; i < a.length; i++) { temp = a[i]; j = i; while ((j > 0) && (temp < a[j - 1])) { a[j] = a[j - 1]; j--; } a[j] = temp; } };
Notice that we have a double test for the inner loop. The first condition is to ensure that we don’t run off the beginning of the array, and the second one is to stop the loop once we reach the correct spot to insert the item. We count from the right in this inner loop, so that we can enforce sort stability (we don’t want to find the first of a set of equal items, we want to find the last).
The interesting thing about insertion sort is that, despite the fact that it is officially a O(n2) algorithm – there is a loop within a loop – it has a best (average) behavior of O(n) if the items are almost sorted. That property means that insertion sort is often used to speed up other algorithms like quicksort: you quicksort until the partitions are around 8 items in size and then insertion sort the whole array. This tends to be faster than just allowing quicksort to complete down to one-item partitions.
Looking at the code for insertion sort, wouldn’t it be nice if we didn’t have the double conditional test? It would help no end in the simple case of using insertion sort to finish off an “almost-sort” algorithm: for most of the time the run-off-the-start-of-the-array check is completely unnecessary. So, in my book, I added the optimization of finding the smallest item in the array and swapping it with the first item, and then performing the standard insertion sort. Since the smallest item acts as a sentinel, I could get rid of the double condition in the inner loop. My inner loop would never run off the beginning of the array: the sentinel is forced to be the smallest item.
var insertionSort = function (a) { var i, j, temp; j = 0; for (i = 1; i < a.length; i++) { if (a[i] < a[j]) { j = i; } } temp = a[0]; a[0] = a[j]; a[j] = temp; for (i = 1; i < a.length; i++) { temp = a[i]; j = i; while (temp < a[j - 1]) { a[j] = a[j - 1]; j--; } a[j] = temp; } };
And there it stood for ten whole years until a couple of days ago when I received an email saying my insertion sort was broken. Nonsense, said I, look: it sorts just fine and dandy. Not so, replied my correspondent, you’ve broken the stability of the insertion sort. To which I was brought up short. He was right: my efficient implementation of insertion sort was no longer stable.
Here’s an example of the problem with cards. Suppose we start off with this:
5♠ 5♣ 2♦ 2♥ 3♣
The first pass through my insertion sort would swap the 5 of spades with the first occurrence of the smallest item, the 2 of diamonds:
2♦ 5♣ 5♠ 2♥ 3♣
And then the insertion sort would proceed with the sentinel to give this:
2♦ 2♥ 3♣ 5♣ 5♠
But note this: the 5 of spades and the 5 of clubs are no longer in their original order. Stability has been broken. The little efficiency of finding and setting the sentinel broke the algorithm.
There are two solutions here I suppose: first is not to use the speed improvement and stick to the standard insertion sort, and the second is to remove the smallest item form the array and insert it into the first position (or, equivalently, shuffle all the items right by one between the first position and where the smallest item was found). Either would work fine, and note the second is still a O(n) process, which is smaller than the overall O(n2) sort.
Of course, the other thing to do is to ignore the problem completely. I used the optimized insertion sort mostly as the final step to an implementation of an efficient quicksort. Since quicksort is unstable by definition, it doesn’t matter in the slightest that my optimized insertion sort is unstable too.
The other point to make is that in all my years of coding I’ve never, ever, relied on a sort as being stable. It’s just never come up. If the original order mattered then there should be a way to alter the comparison to take into account the original order (for cards, for example, you might also sort on the suit as well). Using this enhance-the-comparison method, you can even make quicksort stable (by essentially distinguishing between all duplicate items). So although technically my optimized insertion sort is not bug-free, it’s good enough for my utilization of it.
Published at DZone with permission of Julian Bucknall, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments