Showing posts with label mit. Show all posts
Showing posts with label mit. Show all posts

Monday, 9 May 2011

Concept behind Order Statistics

“How many times you would have required to get a kth smallest number in a set of numbers? It’s a must in many applications. Moreover, the most used is the minimum & maximum in a set of numbers. Although we have lots of O(1) data structures to get min/max, if we get numbers as an array!! we need to reorganize which still takes more time. This is where Order statistics comes into picture”

Introduction

We have seen lots of sorting methods, String analysis algorithms. As we see that the most common algorithmic methods are: sorting & searching. But there is one more, which is selection methods!!

Selecting what is max or min in a set is really simple and also the most used operation in any big programs.

Let’s see an example of a selecting problem in an interview perspective. One of the most asked question is:

Find the buying and selling price of the stock so that we maximize profit. We are given the array of stock prices. (with O(n) time and no extra space)

How would you solve this problem? There are many more related problems like selecting football team members based on the score profiles they have. In Cricket especially, the initial members considered by the board would be the players who have good economy and the computer basically prepares that list. :)

Even you can run an algorithm to find the players in the panel if you know all of their economies!!.

For this a layman way could be to use sorting and then select the top 10 or 20. But it’s not just sorting and elimination since all comparison based sorts take O(n log n) time.

What we need is O(n) time which is linear time.

So the main problem here is, How would you get k-th smallest or largest element in an array with O(n) linear time ?

Concept

What does it mean by k-th smallest element ?

If k=1, then it means the smallest element..

if k=n, then what we get would be the largest element in the array

This ‘k’ which can even decide the sorted order of the elements in the array is the base concept of Order Statistics.

Quick sort, Merge sort and almost all comparison sorts tries to find this ‘k’ of each element in the array so that we can place the elements in the right positions.

We also studied about O(n) sorting methods(counting sort, radix sort) which tries to find this ‘k’ without comparisons. :) Similar to those, we need a selecting method, if our requirement is to just find the k-th position code.

First we will see basic code of finding max & min with lesser constant time.

Code

void getRange(int *arr, int size)
{
    int min=INT_MAX;
    int max=INT_MIN;
    for(int i=0; i < size; i++)
    {
        if(arr[i] < min)
            min = arr[i];
 
        if(arr[i] > max)
            max = arr[i];
    }
    cout<<min<<","<<max;
}
 
 
void main()
{
    int A[] = {2,5,1,8,9,13,4,14,6};
    getRange(A, 9);
}
  • The above code takes O(n) time and n comparisons as well to determine min & max
  • This can be made faster!! Yes!! Avoiding constant time in a code really optimizes the code!!.. Always try remove constant time when you want optimize code!! :)
  • The optimizations possible
    • Setting up initial values of max & min don’t need to at extremes
    • If the size is odd, set min & max to A[0]
    • If the size is even, compare first 2 elements, minimum goes to min & maximum goes to max
    • The above kind of initialization reduces comparisons to n-1.
    • But still we need both max & min here, so we still have 2n-2 comparisons (from n comparisons).
    • next optimization is: comparing in pairs!! which would bring down our comparisons to 3(n/2) :).. Below code does all these optimizations
void getRange(int *arr, int size)
{
    int min;
    int max;
    int start=0;
    if((size & 1) == 1) {
        min = arr[0]; 
        max = arr[0];
        start = 1;
    }
    else {
        if(arr[0] < arr[1]) {
            min = arr[0];
            max = arr[1];
        }
        else {
            min = arr[1];
            max = arr[0];
        }
        start = 2;
    }
    for(int i = start; i < size; i+=2)
    {
        int smaller;
        int larger;
        if(arr[i] < arr[i+1]) {
            smaller = arr[i];
            larger = arr[i+1];
        }
        else {
            smaller = arr[i+1];
            larger = arr[i];
        }
 
        if(min > smaller)
             min = smaller;
 
        if(max < larger)
            max = larger;
    }
    cout<<min<<","<<max;
}
 
 
void main()
{
    int A[] = {2,5,8,9,13,4,14,6};
    getRange(A, 8);
}
  • Although the above code looks bigger, the comparisons are lesser and our problem domain got divided. n/2 :)
  • We have some how divided the problem /2 using optimization. But more cleaner optimization possible if we know the Median!!
  • We will learn about Median in next article!!

MIT Video

MIT order statistics lecture

The above video explains all algorithms in selection method. We will see all these methods in coming blog entries.

Tuesday, 3 May 2011

Relation between Binary Search Tree and Quicksort–MIT lecture series (hand picked)

 

Related links

http://analgorithmaday.blogspot.com/2011/02/partitioning-algorithmin-place.html

http://analgorithmaday.blogspot.com/2011/02/quick-sort.html

http://analgorithmaday.blogspot.com/2011/03/binary-search-treeintroduction.html

Introduction

   We have studied lots of logarithmic performing algorithm which internally use binary trees which is the reason for logarithmic time. But we have studied into it more, about how the performance is log n for a tree ? How does recurrence relation work? What math does Quick sort involve ?

Lets start studying some math behind algorithms with some interesting hand picked MIT lectures.

Video

Binary Search Tree and Quicksort

Explanation

   How would you create a BST using an array ?

A[n] = 3 1 8 2 6 7 5, where A is an array and n is the size of it

BST insert comes out like this,

                        3

                 1               8

                     2       6     

                         5        7

When you do an in-order traversal of the above BST, you get the sorted array..

In-order: 1 2 3 5 6 7 8

So, this means, we have to do n BST inserts and then do an in-order traversal to get a sorted array. This itself is a new sorting method named BST sort. :)

BST-Sort
   Tree T
   array A
   for i = 0 to n
      do Tree-Insert(T, A[n])
   
   Inorder-Traverse-Tree(T)

What is the performance of the above sort?

There are ‘n’ Tree inserts, so its O(nh) where h is the height of the tree.

and then an in-order traversal is done which is O(n) . So it will be O(2nh).

But we know that height of the tree can be computed from number of nodes in it: h = log n.

Example,

Here, the number of nodes = 7, so the height would be, =  log 7

                                                                                    =   log 2^3 (approx.)

                                                                             So, h = 3

But, take an example of an sorted(or reverse sorted) array.

A[n] = {1,2,3,4,5,6}

If you create a BST with these values, you get a tree like this,

1

    2

       3

          4

              5

                    6

What is the performance of the Tree-Insert in the above for loop? In such a case, Height of the tree is equal to the number of nodes!!

height of tree, h = log n, holds approximately only for trees which are partially balanced. So, if height equals number of nodes, you get O(n2) even for this BST sort.

Can’t you recollect that, This is matching with performance of the Quicksort algorithm ?

Yes, it is matching!! :) Quick sort gives O(n2) performance if the array is already sorted. This means, Quick sort as well creates a BST internally. Just way comparisons are made is different. This video explains that beautifully.

So to avoid that, randomized quick sort came into picture.

Randomized quick sort is the basis for all kinds of self-balancing trees!!!

If you understand how randomized quick sort work, we can understand AVL, red-black trees etc..

From the above example, its also clear that why self balancing tree is a must for getting a log n time complexity!!