Tuesday, 8 March 2011

Counting sort–Linear time

“If you know that all the elements in an array is less than some integer, you can do a simple count to calculate the position of elements in the sorted array. This is the concept of counting sort. It doesn’t use any comparisons”

Introduction

   This requires a little bit of mathematics. You have 8 element array with all the elements under a value k. This value should be very small, if not, this method is of no use. :)

  If you know all the elements are less than or equal to k, then you just need to get the count of existence of elements in the array. After you get the count of number of elements coming under the each bin of k, just add up all to get the total at pushed to the end!!

  This is a simple statistics..!! In normal calculations, you take sub totals. Finally you add one with other to get a running sum total. Also known as cumulative sum. This is what is done here. What cumulative sum gives us is the important concept of counting sort

Concept

  The cumulative sum calculated from the counts of numbers, gives us the exact position where the input array can fit inside the sorted output array.

The exact concept is very clear in Cormen-Rivest. We create a map of counts of each integer under k and get a cumulative sum and map it inside the resultant array.

So, ascending order can be got by doing cumulative sum from front –> back in the sums array. Descending order can be got by doing cumulative sum from back –> front in the sums array.

Code

The below code is for the descending order sorting using counting sort.

int* counting_sort(int* arr, int k, int size)
{
    int *B, *C;
    C = (int*) malloc(sizeof(int)*k);
    B = (int*) malloc(sizeof(int)*size);
 
    for(int i=0;i<k; i++) C[i] = 0;
    // get the count of elements in arr. 
    // ASSUMPTION: All elements in arr is < k
    for(int i=0;i<size;i++) C[arr[i]]++;
 
    for(int i=k-1;i>0;i--) C[i-1] += C[i];
 
    for(int j=0;j<size;j++) {
        B[C[arr[j]]] = arr[j];
        C[arr[j]]--;
    }
    return B;
}
void main()
{
    int A[] = {2,3,4,1,6,8,7,5};
    int *b;
    b = counting_sort(A, 9, 8);
    // b[0] is garbage, unused entry
    std::cout<<b[0]<<std::endl;
 
    // b[1]..b[size] is filled with sorted
    std::cout<<b[1];
}

Important points:

  • When it involves only C, then we run for only k times, when its arr, then we run for size times. Total performance comes to O(k+n) approx. comes to O(n). First sorting algorithm we studied with linear time.
  • We never fill index number 0 in the sorted array. This is because, the index for sorted array is the cumulative sum calculated in C array. If that sum is zero, it can only happen when there is no zero in input array. For our example, increasing order case, C[0] =0.. But C[arr[j]] never leads to C[0] because, there is no zero in arr. If at all a zero is present, its cumulative sum gets added up.
  • Because of cumulative sum method, index of resultant sorted array always starts from index 1 and there is no possibility of getting a zero index.
  • free is not done here. Possible heap corruption since B needs size+1 and C is not freed.
  • The above code is not a stable sort!!.. But still you get sorted array as output. The usefulness of stableness is not visible here!!.. Will discuss this in next sort: Radix sort.