Monday, 9 May 2011

Concept behind Order Statistics

“How many times you would have required to get a kth smallest number in a set of numbers? It’s a must in many applications. Moreover, the most used is the minimum & maximum in a set of numbers. Although we have lots of O(1) data structures to get min/max, if we get numbers as an array!! we need to reorganize which still takes more time. This is where Order statistics comes into picture”

Introduction

We have seen lots of sorting methods, String analysis algorithms. As we see that the most common algorithmic methods are: sorting & searching. But there is one more, which is selection methods!!

Selecting what is max or min in a set is really simple and also the most used operation in any big programs.

Let’s see an example of a selecting problem in an interview perspective. One of the most asked question is:

Find the buying and selling price of the stock so that we maximize profit. We are given the array of stock prices. (with O(n) time and no extra space)

How would you solve this problem? There are many more related problems like selecting football team members based on the score profiles they have. In Cricket especially, the initial members considered by the board would be the players who have good economy and the computer basically prepares that list. :)

Even you can run an algorithm to find the players in the panel if you know all of their economies!!.

For this a layman way could be to use sorting and then select the top 10 or 20. But it’s not just sorting and elimination since all comparison based sorts take O(n log n) time.

What we need is O(n) time which is linear time.

So the main problem here is, How would you get k-th smallest or largest element in an array with O(n) linear time ?

Concept

What does it mean by k-th smallest element ?

If k=1, then it means the smallest element..

if k=n, then what we get would be the largest element in the array

This ‘k’ which can even decide the sorted order of the elements in the array is the base concept of Order Statistics.

Quick sort, Merge sort and almost all comparison sorts tries to find this ‘k’ of each element in the array so that we can place the elements in the right positions.

We also studied about O(n) sorting methods(counting sort, radix sort) which tries to find this ‘k’ without comparisons. :) Similar to those, we need a selecting method, if our requirement is to just find the k-th position code.

First we will see basic code of finding max & min with lesser constant time.

Code

void getRange(int *arr, int size)
{
    int min=INT_MAX;
    int max=INT_MIN;
    for(int i=0; i < size; i++)
    {
        if(arr[i] < min)
            min = arr[i];
 
        if(arr[i] > max)
            max = arr[i];
    }
    cout<<min<<","<<max;
}
 
 
void main()
{
    int A[] = {2,5,1,8,9,13,4,14,6};
    getRange(A, 9);
}
  • The above code takes O(n) time and n comparisons as well to determine min & max
  • This can be made faster!! Yes!! Avoiding constant time in a code really optimizes the code!!.. Always try remove constant time when you want optimize code!! :)
  • The optimizations possible
    • Setting up initial values of max & min don’t need to at extremes
    • If the size is odd, set min & max to A[0]
    • If the size is even, compare first 2 elements, minimum goes to min & maximum goes to max
    • The above kind of initialization reduces comparisons to n-1.
    • But still we need both max & min here, so we still have 2n-2 comparisons (from n comparisons).
    • next optimization is: comparing in pairs!! which would bring down our comparisons to 3(n/2) :).. Below code does all these optimizations
void getRange(int *arr, int size)
{
    int min;
    int max;
    int start=0;
    if((size & 1) == 1) {
        min = arr[0]; 
        max = arr[0];
        start = 1;
    }
    else {
        if(arr[0] < arr[1]) {
            min = arr[0];
            max = arr[1];
        }
        else {
            min = arr[1];
            max = arr[0];
        }
        start = 2;
    }
    for(int i = start; i < size; i+=2)
    {
        int smaller;
        int larger;
        if(arr[i] < arr[i+1]) {
            smaller = arr[i];
            larger = arr[i+1];
        }
        else {
            smaller = arr[i+1];
            larger = arr[i];
        }
 
        if(min > smaller)
             min = smaller;
 
        if(max < larger)
            max = larger;
    }
    cout<<min<<","<<max;
}
 
 
void main()
{
    int A[] = {2,5,8,9,13,4,14,6};
    getRange(A, 8);
}
  • Although the above code looks bigger, the comparisons are lesser and our problem domain got divided. n/2 :)
  • We have some how divided the problem /2 using optimization. But more cleaner optimization possible if we know the Median!!
  • We will learn about Median in next article!!

MIT Video

MIT order statistics lecture

The above video explains all algorithms in selection method. We will see all these methods in coming blog entries.