Friday, 28 January 2011

Heap sort

“The very interesting way of sorting numbers comes only when you get atleast one number which is in sorted order out of the unsorted list. For example, Selection sort has that. But, selection sort requires scanning the whole array. Heap sort is a kind of optimization over it as we know always in the heap, the top elements will be the max.”

Introduction

    Do you believe that all sorted array is by default a min heap? In a min heap, you will have the least element as the first element in the heap and then you will have all elements greater than the first element. This property is satisfied by ascending order sorted array. For descending order array, it matches with the max heap property.

   But the important property we have to make a note is, the reverse is not true!!.. Min heap or max heap will never get sorted property. But they do have a partial or near approximate sorting done already in linear time.

    Assume, you get an array which is neither max nor min heap. You need to first convert it into partial sorted max heap. We will see about ascending order first for simplicity.

Concept

    We take an array and convert it into a max heap. Now, we got the maximum element at the top of the heap which is the first element of the array. In a max heap, we are sure that all the elements after the top of the heap is lesser than it. This gives a big hint that, If A is the array, A[0] is already known to be sorted as it is the max and it should be at the end of the list. Similarly, we continue until we find the elements at the reverse order from the end of the list. This also the approach to convert a max heap to min heap. :)

    This becomes a important interview question as well!!.. Convert a max heap to min heap is simply the heap sort !!

Code

   1:  #define LEFT(x) 2*x+1 // since we start with index 0
   2:  #define RIGHT(x) 2*x+2
   3:  #define heap_size(x) (sizeof(x)/sizeof(x[0])) //similar to ARRAYSIZE macro
   4:   
   5:   
   6:  void max_heapify(int *arr, int idx, int size)
   7:  {
   8:      int l = LEFT(idx);
   9:      int r = RIGHT(idx);
  10:      int largest;
  11:      if( l <= size && arr[l] > arr[idx])
  12:          largest = l;
  13:      else
  14:          largest = idx;
  15:   
  16:      if(r <= size && arr[r] > arr[largest])
  17:          largest = r;
  18:   
  19:      if(largest != idx) {
  20:          int temp = arr[idx];
  21:          arr[idx] = arr[largest];
  22:          arr[largest] = temp;
  23:          max_heapify(arr, largest, size);
  24:      }
  25:   
  26:  }
  27:   
  28:  void build_max_heap(int *arr, int size)
  29:  {
  30:      for(int i=(int) floor((float)size/(float)2);i>=0; i--)
  31:      {
  32:          // if size = 10, then,
  33:          // initial max_heapify call will be like, max_heapify(arr, 5)
  34:          // But note that, we run until 0 and zero is root for this code
  35:          max_heapify(arr, i, size); 
  36:      }
  37:  }
  38:   
  39:  void heap_sort(int *arr, int size)
  40:  {
  41:      int length = size;
  42:      build_max_heap(arr, length);
  43:      for(int i=length; i>=1; i--)
  44:      {
  45:          int temp = arr[i];
  46:          arr[i] = arr[0];
  47:          arr[0] = temp;
  48:          size = size-1;
  49:          max_heapify(arr, 0, size);
  50:      }
  51:  }
  52:   
  53:  void main()
  54:  {
  55:      int A[]={5,13,2,25,7,17,20,8,4};
  56:      heap_sort(A, 8);
  57:  }
  • In the above code, we have avoided templates unlike previous code. As i said, templates are risky to handle and increases code size.
  • The algorithm is simple enough. We consider A[i] as the max element and place it at the end. Move the last element to the top of the array and try max heapify over it.
  • Every iteration reduces the size of the heap. Length of the array remains the same

Important notes

  • Heap sort competes with quick sort. Heap sort is heavily used in real time as it gives O(n logn) upper bound for any kind of data set.
  • Heap sort is the best in-place algorithm. Uses constant space and almost linear/logarithmic time
  • Sometimes merge sort performs better since, the number of swaps are less compared to heap sort. Heap sort works great in a sequential RAM rather than a dynamic file system memory.
  • Merge sort is proven to work for linked list. Heap sort works well in sequential high speed caches.
  • Merge and heap sort works very well for large data sets rather than a quick sort with lesser space usage.