Showing posts with label heap. Show all posts
Showing posts with label heap. Show all posts

Friday, 18 February 2011

Tree insert & delete–Heap example!!

“As we move more and more, we see more and more of trees represented in arrays.. Wherever, double recursion is involved, you actually create a tree in an array. But double recursion just emulates array as a tree!!.. What about data structures that has tree kind of structure? Heap is one of this kind!”

See Also

http://analgorithmaday.blogspot.com/2011/01/tree-data-structureexplained.html

http://analgorithmaday.blogspot.com/2011/02/mathematics-behind-binary-tree.html

http://analgorithmaday.blogspot.com/2011/02/priority-queue-using-binary-heap.html

Introduction

   We recently studied about the maths behind binary trees specifically and different types of trees. But where would they really use all these trees? One use case is divide-conquer kind of problems where you deploy double recursion to create a tree on the stack. But we have not used tree as a data structure till now. Merge sort, Heap sort and whatever O(nlog n) sort that you can imagine just uses trees to divide and conquer. Then when we plainly represent and use trees: its plainly used on Priority Queues

   In Priority Queues, we studied lots of new ADTs and priority queue became nothing but a type of heap data structure. Even heap sort makes use of the heap’s properties and operations. So, its time to study heaps operations. Binary tree as such doesn’t need insertion, deletion plainly until there is a usable structure for it. So we wanted to take Heap as an example here,

  • INSERT = add new element to the heap’s bottom
  • DELETE = remove max element from the heap
  • INCREASE_KEY = change the value to a higher value than existing
  • DECREASE_KEY = change the value to lower value than existing

The above functions becomes readable in a tree’s construct only when it is linked with Heap.

Concept

      We studied max_heapify, heap_insert, increase_key already for heap sort as well as priority queue applications. Here, we gonna reuse the same concepts.

  • Max heapify = bubble down, down heap is something which moves an element from top to bottom to satisfy max heap property. We move the element from top to bottom by comparing it with both LEFT and RIGHT
  • heap_insert, increase key = bubble up, up heap is something in which we move an element from the bottom to top. We take the element from bottom to top by just comparing with the PARENT

If you note that, both the above routine inserts elements inside the heap and both perform almost equally (with some extra comparisons in max heapify). But both is used for different purposes sometimes.

Source: Wikipedia

Adding a new element to the heap

x=15 is the new element we need to insert

Compare with the parent and swap. Continue until the parent is lesser than the child.

Finally, the root gets replaced with the new element added. This is up-bubbling which is carried out by out heap-insert procedure.

Deleting an element

This operation is similar to Heap-extract max routine we saw.

In the above diagram, we remove a max element say 15 from the heap. Since after we remove “15”, we assume the last element “4” as the root, since it would be somewhat least. So, we make “4”, the last node, as the heap’s root node

Now, it violates the heap property!!.. Do max-heapify to down-bubble the element.

Code

const int HEAP_SIZE=50;
#define PARENT(i)  (i-1)>>1
#define LEFT(x) 2*x+1 // since we start with index 0
#define RIGHT(x) 2*x+2
 
class MaxHeap
{
    int arr[HEAP_SIZE];
    int count;
 
 
public:
    MaxHeap() 
    {
        count=0;
    }
 
    bool insert(int element);
 
    bool remove(int& element);
 
    bool increase_element(int element, int idx);
 
    bool decrease_element(int element, int idx);
 
    bool change(int element, int idx);
};
 
bool MaxHeap::insert(int element)
{
    if(count > HEAP_SIZE)
        return false;
 
    int i = count;
 
    arr[i] = element;
    // the below loop never runs if the input
    // array is already in maxheap format!!
    while(i>0 && arr[PARENT(i)] < arr[i]) {
        int tmp=arr[PARENT(i)];
        arr[PARENT(i)] = arr[i];
        arr[i] = tmp;
        i = PARENT(i);
    }
    count++;
    return true;
}
 
bool MaxHeap::remove(int& element)
{
    if(count < 0)
        return false;
 
    element = arr[0];
    arr[0] = arr[count-1];
    arr[count-1] = INT_MIN;
    count = count - 1;
    max_heapify(arr, 0, count);
    return true;
}
 
bool MaxHeap::increase_element(int element, int idx)
{
    if(idx > count || idx < 0)
        return false;
 
    if(arr[idx] > element)
        return false;
 
    arr[idx] = element;
 
    // the below code is same as insert
    // repeating it, to be clear
    while(idx > 0 && arr[PARENT(idx)] < arr[idx]) {
        int tmp=arr[PARENT(idx)];
        arr[PARENT(idx)] = arr[idx];
        arr[idx] = tmp;
        idx = PARENT(idx);
    }
    return true;
}
 
bool MaxHeap::decrease_element(int element, int idx)
{
    if(idx > count || idx < 0)
        return false;
 
    if(element > arr[idx])
        return false;
 
    arr[idx]= element;
    max_heapify(arr, idx, count);
    return true;
}
 
bool MaxHeap::change(int element, int idx)
{
    if(idx > count || idx < 0)
        return false;
 
    if(element > arr[idx] )
        return increase_element(element, idx);
    else
        return decrease_element(element, idx);
}
 
 
void main()
{
    MaxHeap obj;
    int A[] = {7,2,1,12,15,8,4,0,6,13,9,5};
 
    for(int i=0; i<12;i++) {
        obj.insert(A[i]);
    }
    int val;
    obj.remove(val);
    obj.increase_element(20, 5);
    obj.decrease_element(3, 0);
 
    obj.change(80, 8);
    obj.change(0, 0);
 
}

Important points

  • change does both increase, decrease
  • max_heapify is reused

Wednesday, 16 February 2011

External sorting using K-way merge

“The most important problem of all is the resources.. the RAM & CPU!!.. As we know, Algorithms are only for people who don’t know how to buy RAM.. :) This is a problem in which a person just has 100 MB of RAM and wanted to sort a 1 GB file with it. To sort the 1 GB file with just 100 MB, you need to design the algorithm in such a way that, it can be divided & conquered.. This is a classic problem showcased for optimizing memory usage of a system!!”

Metaphor

   This is in no context with practical world.. But a simple optimization problem to handle large number of resources with limited data. So, we will directly jump into the problem and the algorithm

Concept

Reference: Wikipedia

An example of external sorting is the external merge sort algorithm, which sorts chunks that each fit in RAM, then merges the sorted chunks together. For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:

  1. Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
  2. Write the sorted data to disk.
  3. Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks, which now need to be merged into one single output file.
  4. Read the first 10 MB of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
  5. Perform a 9-way merge and store the result in the output buffer. If the output buffer is full, write it to the final sorted file. If any of the 9 input buffers gets empty, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available.

The important point here is that, after you split the 900 MB data into 100 MB each and sort.. In the 9 sorted lists newly created, we apply a 9-way merge so that we take top 10 MB from each list.. which makes up to 90 MB. The free 10 MB in ram will be used as output write buffer. This 90 MB is merged in-place with merge algorithm.

Code

We will see this code in next article when we do insertion sort with merge operation

Friday, 28 January 2011

Heap sort

“The very interesting way of sorting numbers comes only when you get atleast one number which is in sorted order out of the unsorted list. For example, Selection sort has that. But, selection sort requires scanning the whole array. Heap sort is a kind of optimization over it as we know always in the heap, the top elements will be the max.”

Introduction

    Do you believe that all sorted array is by default a min heap? In a min heap, you will have the least element as the first element in the heap and then you will have all elements greater than the first element. This property is satisfied by ascending order sorted array. For descending order array, it matches with the max heap property.

   But the important property we have to make a note is, the reverse is not true!!.. Min heap or max heap will never get sorted property. But they do have a partial or near approximate sorting done already in linear time.

    Assume, you get an array which is neither max nor min heap. You need to first convert it into partial sorted max heap. We will see about ascending order first for simplicity.

Concept

    We take an array and convert it into a max heap. Now, we got the maximum element at the top of the heap which is the first element of the array. In a max heap, we are sure that all the elements after the top of the heap is lesser than it. This gives a big hint that, If A is the array, A[0] is already known to be sorted as it is the max and it should be at the end of the list. Similarly, we continue until we find the elements at the reverse order from the end of the list. This also the approach to convert a max heap to min heap. :)

    This becomes a important interview question as well!!.. Convert a max heap to min heap is simply the heap sort !!

Code

   1:  #define LEFT(x) 2*x+1 // since we start with index 0
   2:  #define RIGHT(x) 2*x+2
   3:  #define heap_size(x) (sizeof(x)/sizeof(x[0])) //similar to ARRAYSIZE macro
   4:   
   5:   
   6:  void max_heapify(int *arr, int idx, int size)
   7:  {
   8:      int l = LEFT(idx);
   9:      int r = RIGHT(idx);
  10:      int largest;
  11:      if( l <= size && arr[l] > arr[idx])
  12:          largest = l;
  13:      else
  14:          largest = idx;
  15:   
  16:      if(r <= size && arr[r] > arr[largest])
  17:          largest = r;
  18:   
  19:      if(largest != idx) {
  20:          int temp = arr[idx];
  21:          arr[idx] = arr[largest];
  22:          arr[largest] = temp;
  23:          max_heapify(arr, largest, size);
  24:      }
  25:   
  26:  }
  27:   
  28:  void build_max_heap(int *arr, int size)
  29:  {
  30:      for(int i=(int) floor((float)size/(float)2);i>=0; i--)
  31:      {
  32:          // if size = 10, then,
  33:          // initial max_heapify call will be like, max_heapify(arr, 5)
  34:          // But note that, we run until 0 and zero is root for this code
  35:          max_heapify(arr, i, size); 
  36:      }
  37:  }
  38:   
  39:  void heap_sort(int *arr, int size)
  40:  {
  41:      int length = size;
  42:      build_max_heap(arr, length);
  43:      for(int i=length; i>=1; i--)
  44:      {
  45:          int temp = arr[i];
  46:          arr[i] = arr[0];
  47:          arr[0] = temp;
  48:          size = size-1;
  49:          max_heapify(arr, 0, size);
  50:      }
  51:  }
  52:   
  53:  void main()
  54:  {
  55:      int A[]={5,13,2,25,7,17,20,8,4};
  56:      heap_sort(A, 8);
  57:  }
  • In the above code, we have avoided templates unlike previous code. As i said, templates are risky to handle and increases code size.
  • The algorithm is simple enough. We consider A[i] as the max element and place it at the end. Move the last element to the top of the array and try max heapify over it.
  • Every iteration reduces the size of the heap. Length of the array remains the same

Important notes

  • Heap sort competes with quick sort. Heap sort is heavily used in real time as it gives O(n logn) upper bound for any kind of data set.
  • Heap sort is the best in-place algorithm. Uses constant space and almost linear/logarithmic time
  • Sometimes merge sort performs better since, the number of swaps are less compared to heap sort. Heap sort works great in a sequential RAM rather than a dynamic file system memory.
  • Merge sort is proven to work for linked list. Heap sort works well in sequential high speed caches.
  • Merge and heap sort works very well for large data sets rather than a quick sort with lesser space usage.

Sunday, 23 January 2011

Build a max heap with input array

“As we studied tree in brief, and understood an important concept that, every node is a root for some tree except the leaves. With this as the basics, we should build a heap. The important property of a heap is that mostly, the last elements of the array will become the leaves, that too it always in range [n/2]+1, [n/2]+2,…n”

See also

http://analgorithmaday.blogspot.com/2011/01/heap-data-structure.html

http://analgorithmaday.blogspot.com/2011/01/make-partial-max-heap-perfect-max-heap.html

Metaphor

   How would a forest tree grow? Bottom up right? :) This is not a chicken and egg problem btw. :P 

   But top to bottom model comes into picture only for a family hierarchy or a majestic empire cabinet. But it shows as clearly that, there are two ways a tree kind of structure can grow, top to bottom (impossible for a forest tree) and bottom to up. Lets take the example of bottom to up example as it matches with what we are seeing now.

   For growing a tree, you start with the seeds. After the seeds are planted, roots grow first and make the tree grow upwards with leaves. But just understand the basic model of bottom up here. roots grow first and then the whole tree.

Concept

   The bottom up tree growth from a seed is interesting!! But note that we start with the root. But a computer tree, has leaves at the bottom and roots at the top!! This never matches with a normal forest tree!!.. Bad design :D

   But the methodology in which a computer tree is constructed follows the same historical ways: bottom up, top down. For a heap, we need to construct a tree bottom up like how a natural forest tree grows. But you might shout out like leaves should be at the bottom!! where would you fit the leaves?

   We treat all the leaves as root in any bottom up traversal approach in computer algorithms. Since we grow from the bottom, we should get the leaves of a computer tree as input!!.. Since we only know the leaves, we should treat them as initial root obviously. The next input that we get might be assumed as the root of some of these leaves and the algorithm goes until you are left with only one root to process.

   This is the approach the build max heap algorithm follows. For understanding this fully, you should understand how a normal tree is converted to a max heap tree. [http://analgorithmaday.blogspot.com/2011/01/make-partial-max-heap-perfect-max-heap.html]

    The basic input to the max heapify algorithm is: the root of a tree. The tree can be already following or not following the max heap principle (parent > children). But max heapify traverses the full tree until it reaches a small tree in which parent > children. This basic principle can be used to build the heap.

    The interesting point in a heap is, its leaves will be always [n/2]+1, [n/2]+2,..n. For example, if size of the heap is n=10, then 6,7,8,9,10 are the leaves. If these indices are leaves, the next level indices which will all be roots which means [n/2]…1 are the remaining nodes which roots.

Code

   1:  #define LEFT(x) 2*x+1 // since we start with index 0
   2:  #define RIGHT(x) 2*x+2
   3:  #define heap_size(x) (sizeof(x)/sizeof(x[0])) //similar to ARRAYSIZE macro
   4:   
   5:  template <typename T, size_t size>
   6:  void build_max_heap(T (&arr)[size])
   7:  {
   8:      for(int i=(int) floor((float)heap_size(arr)/(float)2);i>=0; i--)
   9:      {
  10:          // if size = 10, then,
  11:          // initial max_heapify call will be like, max_heapify(arr, 5)
  12:          // But note that, we run until 0 and zero is root for this code
  13:          max_heapify(arr, i); 
  14:      }
  15:  }
  16:   
  17:  template <typename T, size_t size>
  18:  void max_heapify(T (&arr)[size], int idx)
  19:  {
  20:      int l = LEFT(idx);
  21:      int r = RIGHT(idx);
  22:      int largest;
  23:      if( l <= heap_size(arr) && arr[l] > arr[idx])
  24:          largest = l;
  25:      else
  26:          largest = idx;
  27:   
  28:      if(r <= heap_size(arr) && arr[r] > arr[largest])
  29:          largest = r;
  30:   
  31:      if(largest != idx) {
  32:          int temp = arr[idx];
  33:          arr[idx] = arr[largest];
  34:          arr[largest] = temp;
  35:          max_heapify(arr, largest);
  36:      }
  37:   
  38:  }
  39:   
  40:  void main()
  41:  {
  42:      int A[]={5,3,17,10,84,19,6,22,9};
  43:      build_max_heap(A);
  44:  }

Important observations while coding

  • As we start from zero, in the actual algorithm and explanation in Corman-Lewis, assume –1 for all the representation.
  • After the execution of the code, you always get the max element at the index 0.
  • The time taken to build a max heap is explained in book. As you can see a recursion and a tree involved, you would normally assume O(log n). Along with it, we have a loop running for n/2. So we would usually assume O(n log n).
  • But the above time is not so tight and its too much upper bound to assume. This heap building algorithm comes down to O(n)  at perfect worst case.
  • Note that we never get, array out-of-bounds, as we verify LEFT(i) and RIGHT(i) with the array total size. This is an important step to never miss!!!
  • This is a recursion call from inside a loop. That’s why we assume O(n log n). But if we calculate correctly, this is the linear time algorithm and best algorithm for organizing large data sets.
  • This algorithm is the basis for heap sort algorithm. Don’t dive directly into heap sort without understanding max heapify and build max heap algorithms.

Thursday, 20 January 2011

Make a partial max heap a perfect max heap.

 

Maintain the property that Parent > Children on every tree on the heap.

See also:

http://analgorithmaday.blogspot.com/2011/01/heap-data-structure.html

Introduction

   When you include some more elements in the array to the max heap, you need to still satisfy the property that the parent > children. To get this property true, we need to change the root of the tree or swap some elements(if its array) from parent to children. To simplify this operation, we use a simple induction method which is the reverse of the condition.

    Consider, parent > children is violated at index i. Also, assume that, the LEFT(i) and RIGHT(i) trees have not violated this condition yet. This is the initial condition using which we swap elements to make a perfect max heap.

More concepts

   Consider we have the following array which is partially a max heap, (Courtesy: Corman-Lewis)

Index: 1 2 3 4 5 6 7 8 9 10
Value: 16 4 10 14 7 9 3 2 8 1

You might note something interesting comes out from the above data set. If you see, 16->10->9->3 is sorted. i.e, the RIGHT(root) tree is sorted but not the LEFT(root) tree.

                             16

                      /               \

                4                         10

          /          \                 /        \

     14              7             9             3

/      \         /

2        8     1

 

This means, we need to convert the left sub-tree to match the max heap property. That, parent > child while already RIGHT(root) tree is following that.

Code

  • Calculate the left and right index for the current root
  • check whether the value at the left is greater than the root
    • If yes, this is the largest
    • no, then current root is the largest
  • Compare the largest element between LEFT(i) and i with RIGHT(i), if it becomes the largest, then largest = RIGHT index
  • If both are not true, then return the algorithm.. This is a breaking condition.
  • If left or right becomes the largest, continue the recursion in the direction of the largest, until left or right doesn't exceed the heap size. This is one more return condition
   1:  #define LEFT(x) 2*x+1 // since we start with index 0
   2:  #define RIGHT(x) 2*x+2
   3:  #define heap_size(x) (sizeof(x)/sizeof(x[0])) //similar to ARRAYSIZE macro
   4:   
   5:  template <typename T, size_t size>
   6:  void max_heapify(T (&arr)[size], int idx)
   7:  {
   8:      int l = LEFT(idx);
   9:      int r = RIGHT(idx);
  10:      int largest;
  11:      if( l <= heap_size(arr) && arr[l] > arr[idx])
  12:          largest = l;
  13:      else
  14:          largest = idx;
  15:   
  16:      if(r <= heap_size(arr) && arr[r] > arr[largest])
  17:          largest = r;
  18:   
  19:      if(largest != idx) {
  20:          int temp = arr[idx];
  21:          arr[idx] = arr[largest];
  22:          arr[largest] = temp;
  23:          max_heapify(arr, largest);
  24:      }
  25:   
  26:  }
  27:   
  28:  void main()
  29:  {
  30:      int A[]={16,4,10,14,7,9,3,2,8,1};
  31:      int val = heap_size(A);
  32:      max_heapify(A, 1);
  33:      
  34:  }

 

Problems found while coding:

  • As we always consider start index as 0, LEFT(i) will equal 2i+1 and RIGHT(i) will equal 2i+2
  • Interesting problem: in C/C++, array is passed by value by default which means you just get a pointer inside the function, not the full array. So, you cannot use sizeof(array) to get correct size of the array
  • To complicate this more, in C++, we have possibility of passing multiple references if we know the size of the array. So, we make use of templates to make it generic. (not advised for large scale use) Refer: http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=241
  • Read more about pointers and references here: http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=166
  • I will cover a extensive topic about pointers and references soon.

Wednesday, 19 January 2011

Heap data structure

“Heap as many would assume is not a garbage collected storage or some memory returns by malloc API. Heap is a data structure which looks like a tree with certain properties. Don’t ever confuse it with dynamic or garbage collected memory.”

Introduction

   Heap is represented as an array usually and is organized in the form of a tree inside the array.   Heap is something which grows from top to bottom in the form of a binary tree so that, you always get some required value from the root of the tree.There are 2 types of heaps: min heap and max heap. In a min heap, you get the smallest element in the array as the root of the tree. In a max heap, you get the largest element in the array as the root of the tree. These properties are imposed on a heap structure for usage in two kinds of applications: sorting and priority queues.

  As we already know, data structures simplify the way algorithm operations are performed. Like how a merge sort got simplified by splitting the array in a tree fashion, heap array is organized in the form of the tree for reducing complexity in processing it,

              PARENT(i/2)

LEFT(2i)              RIGHT(2i+1) 

Where, i is the index in the array. Usually, when representing heap with an array, we restrict the size of heap so that it is lesser than the total array size. Heap size can increase any time to include some more part of the array.

   Note that in merge sort, we split the array with the length of the array at runtime. Here, we use the heap size for calculating PARENT, LEFT, RIGHT values statically and structure it as a tree.

So, if heap size is 6, then the array index gets organized in the form of tree like this.

                        1                           ---------------- Level 1

                  /          \

               2               3                  --------------- Level 2

          /        \        /

       4            5    6                      --------------- Level 3

For understanding more about this structure, we need to learn more about tree data structure. Will see about it in next article.

But just see that, Level 1 is called the root of the tree. Other levels are just child nodes rooted to some node which is considered as parent. So, in the above picture, total 3 trees are there, 1-2-3, 2-4-5, 3-6, all are connected together.

Important concepts & understandings

  • We saw many things about recursion in previous articles. We also noted that when there is 2 function call recursion, at run time, it creates a perfect binary tree on the runtime stack. The expansion happens similar to a tree structure until you reach the extreme return condition which usually happens on the leaves of the tree (the bottom of the tree).
  • As we move forward with heap data structure which is tree based, we need to use recursion to traverse it easily as explained in above point, without the need of too much temporaries/loops.
  • Note that, height of the heap can be calculated like log2 <heap-size>. For the above example, it comes approx. 2.6. If we ceil it, it comes to 3 (check for what value of n, 2^n gives heap size=6. If you compute 2^2.6, it gives 6.06286)
  • Total number of nodes: again 2^2.6 = 6. As we studied in last article, we use logarithms to reduce the complexity in representation of large numbers. So, we simply tell log2n as height of the tree, where n is the size of the heap. Inverse logarithm gives a bigger size which is the size of heap.