Tuesday, 3 May 2011

Relation between Binary Search Tree and Quicksort–MIT lecture series (hand picked)

 

Related links

http://analgorithmaday.blogspot.com/2011/02/partitioning-algorithmin-place.html

http://analgorithmaday.blogspot.com/2011/02/quick-sort.html

http://analgorithmaday.blogspot.com/2011/03/binary-search-treeintroduction.html

Introduction

   We have studied lots of logarithmic performing algorithm which internally use binary trees which is the reason for logarithmic time. But we have studied into it more, about how the performance is log n for a tree ? How does recurrence relation work? What math does Quick sort involve ?

Lets start studying some math behind algorithms with some interesting hand picked MIT lectures.

Video

Binary Search Tree and Quicksort

Explanation

   How would you create a BST using an array ?

A[n] = 3 1 8 2 6 7 5, where A is an array and n is the size of it

BST insert comes out like this,

                        3

                 1               8

                     2       6     

                         5        7

When you do an in-order traversal of the above BST, you get the sorted array..

In-order: 1 2 3 5 6 7 8

So, this means, we have to do n BST inserts and then do an in-order traversal to get a sorted array. This itself is a new sorting method named BST sort. :)

BST-Sort
   Tree T
   array A
   for i = 0 to n
      do Tree-Insert(T, A[n])
   
   Inorder-Traverse-Tree(T)

What is the performance of the above sort?

There are ‘n’ Tree inserts, so its O(nh) where h is the height of the tree.

and then an in-order traversal is done which is O(n) . So it will be O(2nh).

But we know that height of the tree can be computed from number of nodes in it: h = log n.

Example,

Here, the number of nodes = 7, so the height would be, =  log 7

                                                                                    =   log 2^3 (approx.)

                                                                             So, h = 3

But, take an example of an sorted(or reverse sorted) array.

A[n] = {1,2,3,4,5,6}

If you create a BST with these values, you get a tree like this,

1

    2

       3

          4

              5

                    6

What is the performance of the Tree-Insert in the above for loop? In such a case, Height of the tree is equal to the number of nodes!!

height of tree, h = log n, holds approximately only for trees which are partially balanced. So, if height equals number of nodes, you get O(n2) even for this BST sort.

Can’t you recollect that, This is matching with performance of the Quicksort algorithm ?

Yes, it is matching!! :) Quick sort gives O(n2) performance if the array is already sorted. This means, Quick sort as well creates a BST internally. Just way comparisons are made is different. This video explains that beautifully.

So to avoid that, randomized quick sort came into picture.

Randomized quick sort is the basis for all kinds of self-balancing trees!!!

If you understand how randomized quick sort work, we can understand AVL, red-black trees etc..

From the above example, its also clear that why self balancing tree is a must for getting a log n time complexity!!