Wednesday, 19 January 2011

Heap data structure

“Heap as many would assume is not a garbage collected storage or some memory returns by malloc API. Heap is a data structure which looks like a tree with certain properties. Don’t ever confuse it with dynamic or garbage collected memory.”

Introduction

   Heap is represented as an array usually and is organized in the form of a tree inside the array.   Heap is something which grows from top to bottom in the form of a binary tree so that, you always get some required value from the root of the tree.There are 2 types of heaps: min heap and max heap. In a min heap, you get the smallest element in the array as the root of the tree. In a max heap, you get the largest element in the array as the root of the tree. These properties are imposed on a heap structure for usage in two kinds of applications: sorting and priority queues.

  As we already know, data structures simplify the way algorithm operations are performed. Like how a merge sort got simplified by splitting the array in a tree fashion, heap array is organized in the form of the tree for reducing complexity in processing it,

              PARENT(i/2)

LEFT(2i)              RIGHT(2i+1) 

Where, i is the index in the array. Usually, when representing heap with an array, we restrict the size of heap so that it is lesser than the total array size. Heap size can increase any time to include some more part of the array.

   Note that in merge sort, we split the array with the length of the array at runtime. Here, we use the heap size for calculating PARENT, LEFT, RIGHT values statically and structure it as a tree.

So, if heap size is 6, then the array index gets organized in the form of tree like this.

                        1                           ---------------- Level 1

                  /          \

               2               3                  --------------- Level 2

          /        \        /

       4            5    6                      --------------- Level 3

For understanding more about this structure, we need to learn more about tree data structure. Will see about it in next article.

But just see that, Level 1 is called the root of the tree. Other levels are just child nodes rooted to some node which is considered as parent. So, in the above picture, total 3 trees are there, 1-2-3, 2-4-5, 3-6, all are connected together.

Important concepts & understandings

  • We saw many things about recursion in previous articles. We also noted that when there is 2 function call recursion, at run time, it creates a perfect binary tree on the runtime stack. The expansion happens similar to a tree structure until you reach the extreme return condition which usually happens on the leaves of the tree (the bottom of the tree).
  • As we move forward with heap data structure which is tree based, we need to use recursion to traverse it easily as explained in above point, without the need of too much temporaries/loops.
  • Note that, height of the heap can be calculated like log2 <heap-size>. For the above example, it comes approx. 2.6. If we ceil it, it comes to 3 (check for what value of n, 2^n gives heap size=6. If you compute 2^2.6, it gives 6.06286)
  • Total number of nodes: again 2^2.6 = 6. As we studied in last article, we use logarithms to reduce the complexity in representation of large numbers. So, we simply tell log2n as height of the tree, where n is the size of the heap. Inverse logarithm gives a bigger size which is the size of heap.