Wednesday, 9 March 2011

Radix sort–Punch card sorting

“The oldest algorithm ever invented. Simpler and very useful. Used in mechanical devices even. So this is even before computers are invented. Involves pure mathematics and works for any numeral and text systems. The oldest method used to sort dictionaries”

See Also

http://analgorithmaday.blogspot.com/2011/03/counting-sortlinear-time.html

Introduction

   This is one of the oldest method and some what easy to understand until you get the point clear. The main idea behind this sort is: if there is array of digits, if you sort array from the least significant to most significant number, you finally get a sorted array.

  Similarly, in a dictionary in which you sort the words lexicographically, Most significant character to least significant character sorting is done.. All these are Radix sorts.. It gives a good performance benefit as well, i.e, around O(n).

  This sort is not so important. But gives as an idea about why stable sorting is important in certain cases. Radix sort will not work out if the order of the elements is not maintained while putting the elements at the sorted spot.

Concept

    Radix sort supports all bases starting from 2.. But the algorithm we wrote and tested is only for base 10 for simplicity. We take each digit and sort the whole array around it.

For example,

{329,457,657,839,436,720,355}

On first sorting around the least significant digit, should yield,

{720,355,436,457,657,329,839}

{720,329,436,839,355,457,657}

{329,355,436,457,657,720,839}

Sort just the digits picked from 1’s place, 10’s place and 100th place. Even though intermediate steps looks like unsorted, You finally get a sorted array out of it. This is achieved just because the order of elements in maintained through out the iterations.

The above example illustrates only an array of integers with equal number of digits. This will also work for integers sizes < 3. The best way to start radix sort is to find out the maximum value and count the digits in it or to decide on the number of max digits.

  Usually, in a mechanical punching card where this algorithm is used, they have static 12 digit number imprinted over it and sort starting from LSD to MSD.

Two kinds of radix sort is clearly visible now: LSD radix sort, MSD radix sort. MSD is mostly used for dictionary sort. We will discuss LSD sort in this article.

The simple algorithm is like this,

1. For 0 to number of digits

2.       Sort the digits in a stable way in the main array

Code

#define BASE 10
 
void counting_sort_radix(int*& arr, int k, int size)
{
    int *B, *C;
    C = (int*) malloc(sizeof(int)*BASE);
    B = (int*) malloc(sizeof(int)*(size+1));
 
    for(int i=0;i<BASE; i++) C[i] = 0;
 
    // only the last digit need to be used 
    // for sorting
    for(int i=0;i<size;i++){
        int val = arr[i]/k;
        int idx = val%BASE;
        C[idx]++;
    }
 
    for(int i=1;i<BASE;i++) C[i] += C[i-1];
 
    for(int j=size-1;j>=0;j--) {
        int val = arr[j]/k;
        int idx = val%BASE;
        B[C[idx]] = arr[j];
        C[idx]--;
    }
 
    free(C);
    for(int i=0; i<size; i++) arr[i] = B[i+1];
    free(B);
}
 
void radix_sort(int* arr, int d, int size)
{
    //d = number of digits
    // must calculate d from the max val in array
 
    for(int i=0;i<d; i++) {
        counting_sort_radix(arr, pow((float) BASE, (float) i), size);
    }
}
 
void main()
{
    int A[] = {329,457,657,89,46,720,355};
 
    radix_sort(A, 3, 7);
 
 
}

Important points

  • We have reused counting sort as the base sorting algorithm for our radix sort algorithm
  • We go from 1st place to 100th place in above code.. it will run until 3 digits. So, input can only have max 3 digits
  • BASE is only tested for 10. It can be base 2, base 8(octet), base 16(hex).
  • We need to read different digit in different iterations. That’s why we have to modify the counting sort and get k=10^d as argument which is used to evaluate arr[j]/k and val%BASE. These two gives you the exact digit at necessary place we are looking for
  • The 10^d loop decides the place around which counting sort needs to be applied
  • B[] as usual requires size+1 since we never get a zero digit count. So, we copy from B[1] –> A[0] to balance it.
  • THE MOST IMPORTANT BUG FIX IN THIS CODE IS STABILITY!!.. The last loop which creates the B[] must run from length to 0 rather than running from 0 to length. If we run from 0 to length, we just get all elements in reverse order instead of forward order.
  • Why when we run from 0 to length, we get the order of elements in reverse ? Because cumulative sum is taken from front to back, denoting all the other elements go below that index from back to front. So, we must go in reverse to get the correct order
  • Without this stability condition, we will never get a sorted list from radix sort. Stability of the sorting used to sort digits is very important in radix sort method
  • Any stable sorting method can be used to sort the digits, not just counting sort.