Sunday, 23 January 2011

Build a max heap with input array

“As we studied tree in brief, and understood an important concept that, every node is a root for some tree except the leaves. With this as the basics, we should build a heap. The important property of a heap is that mostly, the last elements of the array will become the leaves, that too it always in range [n/2]+1, [n/2]+2,…n”

See also

http://analgorithmaday.blogspot.com/2011/01/heap-data-structure.html

http://analgorithmaday.blogspot.com/2011/01/make-partial-max-heap-perfect-max-heap.html

Metaphor

   How would a forest tree grow? Bottom up right? :) This is not a chicken and egg problem btw. :P 

   But top to bottom model comes into picture only for a family hierarchy or a majestic empire cabinet. But it shows as clearly that, there are two ways a tree kind of structure can grow, top to bottom (impossible for a forest tree) and bottom to up. Lets take the example of bottom to up example as it matches with what we are seeing now.

   For growing a tree, you start with the seeds. After the seeds are planted, roots grow first and make the tree grow upwards with leaves. But just understand the basic model of bottom up here. roots grow first and then the whole tree.

Concept

   The bottom up tree growth from a seed is interesting!! But note that we start with the root. But a computer tree, has leaves at the bottom and roots at the top!! This never matches with a normal forest tree!!.. Bad design :D

   But the methodology in which a computer tree is constructed follows the same historical ways: bottom up, top down. For a heap, we need to construct a tree bottom up like how a natural forest tree grows. But you might shout out like leaves should be at the bottom!! where would you fit the leaves?

   We treat all the leaves as root in any bottom up traversal approach in computer algorithms. Since we grow from the bottom, we should get the leaves of a computer tree as input!!.. Since we only know the leaves, we should treat them as initial root obviously. The next input that we get might be assumed as the root of some of these leaves and the algorithm goes until you are left with only one root to process.

   This is the approach the build max heap algorithm follows. For understanding this fully, you should understand how a normal tree is converted to a max heap tree. [http://analgorithmaday.blogspot.com/2011/01/make-partial-max-heap-perfect-max-heap.html]

    The basic input to the max heapify algorithm is: the root of a tree. The tree can be already following or not following the max heap principle (parent > children). But max heapify traverses the full tree until it reaches a small tree in which parent > children. This basic principle can be used to build the heap.

    The interesting point in a heap is, its leaves will be always [n/2]+1, [n/2]+2,..n. For example, if size of the heap is n=10, then 6,7,8,9,10 are the leaves. If these indices are leaves, the next level indices which will all be roots which means [n/2]…1 are the remaining nodes which roots.

Code

   1:  #define LEFT(x) 2*x+1 // since we start with index 0
   2:  #define RIGHT(x) 2*x+2
   3:  #define heap_size(x) (sizeof(x)/sizeof(x[0])) //similar to ARRAYSIZE macro
   4:   
   5:  template <typename T, size_t size>
   6:  void build_max_heap(T (&arr)[size])
   7:  {
   8:      for(int i=(int) floor((float)heap_size(arr)/(float)2);i>=0; i--)
   9:      {
  10:          // if size = 10, then,
  11:          // initial max_heapify call will be like, max_heapify(arr, 5)
  12:          // But note that, we run until 0 and zero is root for this code
  13:          max_heapify(arr, i); 
  14:      }
  15:  }
  16:   
  17:  template <typename T, size_t size>
  18:  void max_heapify(T (&arr)[size], int idx)
  19:  {
  20:      int l = LEFT(idx);
  21:      int r = RIGHT(idx);
  22:      int largest;
  23:      if( l <= heap_size(arr) && arr[l] > arr[idx])
  24:          largest = l;
  25:      else
  26:          largest = idx;
  27:   
  28:      if(r <= heap_size(arr) && arr[r] > arr[largest])
  29:          largest = r;
  30:   
  31:      if(largest != idx) {
  32:          int temp = arr[idx];
  33:          arr[idx] = arr[largest];
  34:          arr[largest] = temp;
  35:          max_heapify(arr, largest);
  36:      }
  37:   
  38:  }
  39:   
  40:  void main()
  41:  {
  42:      int A[]={5,3,17,10,84,19,6,22,9};
  43:      build_max_heap(A);
  44:  }

Important observations while coding

  • As we start from zero, in the actual algorithm and explanation in Corman-Lewis, assume –1 for all the representation.
  • After the execution of the code, you always get the max element at the index 0.
  • The time taken to build a max heap is explained in book. As you can see a recursion and a tree involved, you would normally assume O(log n). Along with it, we have a loop running for n/2. So we would usually assume O(n log n).
  • But the above time is not so tight and its too much upper bound to assume. This heap building algorithm comes down to O(n)  at perfect worst case.
  • Note that we never get, array out-of-bounds, as we verify LEFT(i) and RIGHT(i) with the array total size. This is an important step to never miss!!!
  • This is a recursion call from inside a loop. That’s why we assume O(n log n). But if we calculate correctly, this is the linear time algorithm and best algorithm for organizing large data sets.
  • This algorithm is the basis for heap sort algorithm. Don’t dive directly into heap sort without understanding max heapify and build max heap algorithms.