An Algorithm a Day: selection

Showing posts with label selection. Show all posts

Tuesday, 31 May 2011

Randomized Selection algorithm

“Find the k-th smallest element in an array without sorting.. This is the algorithm which does that.”

Introduction

We have already seen lots of information about order statistics. Finding just the k-th smallest element in an array doesn’t require sorting but you just need to find out where the k-th smallest element will fit in.

Usual way would be, to find the max value using a variable. But what would you do, if you want to find the second max? You would maintain 2 variables. Similarly, its possible to maintain a stack of current minimums for each of the 2 elements in the array in O(n) complexity with space O(n). :) We will see about this problem in next article and also its an interview question in some companies. :D

But the right way to find the k-th minimum element is randomized selection algorithm. We have already learned about the randomized partition usage. We can use the same in randomized selection itself.

Concept

Do randomized partition. Get the location of the pivot which is also random. After you get the location, just subtract the location from the start of the array. This is nothing but the k-th position in the sorted array.

Example,

arr = 9,5,7,1,10,2,3

if the randompivot selected is 5, and the partition has happened then, the resulting array will be,

arr = 1 2 3 5 9 7 10

The current location returned will be 3 (array starting from zero).

We assumed above array is starting from zero. :) But the actual formula to calculate the k position will be,

position – start +1, we need this +1 to get the exact location.

If the above value is equal to the given k, we have got the best case here. Just O(n) to find our required smallest element.

For example, say the user wants the 3rd smallest element. We already got that. :) we don’t need to continue anymore.

But important thing here is how our random generator works and what we expect from it. If the pivot selected always helps us out then we get the best case.

We continue this randomized pivot selection based on the first value and continue recursively on either side.

1 2 3 5 9 7 10

start mid end

now, k = mid –start +1 = 4

Say, we need 6th smallest(7). we have got only the 4th smallest. since 4 is lesser than 6, we will have our 6th smallest in the left side. i.e,

within, 9 7 10

This is, mid+1 to end

But the k-th element we need to find inside this array is just the 2nd smallest!! :D

So, we need to do 6-4, that is, k – (given value) to be passed to this recursion

similarly, we need to cover for the greater case as well.

Code

int random_partition(int* arr, int start, int end){    srand(time(NULL));    int pivotIdx = start + rand() % (end-start+1);    int pivot = arr[pivotIdx];    swap(arr[pivotIdx], arr[end]); // move pivot element to the end    pivotIdx = end;    int i = start -1;     for(int j=start; j<=end-1; j++)    {        if(arr[j] <= pivot)        {            i = i+1;            swap(arr[i], arr[j]);        }    }     swap(arr[i+1], arr[pivotIdx]);    return i+1;} int random_selection(int* arr, int start, int end, int k){    if(start == end)        return arr[start];     if(k ==0) return -1;     if(start < end)    {     int mid = random_partition(arr, start, end);    i = mid - start + 1;    if(i == k)        return arr[mid];    else if(k < i)        return random_selection(arr, start, mid-1, k);    else         return random_selection(arr, mid+1, end, k-i);    } }void main(){    int A[] = {9,5,7,1,10,2,3};    int loc = random_selection(A, 0,6,5);}

Note that the k argument we pass is not starting from 0.. it starts from 1
Even though our random partition never returns greater than 6. We calculate the distance from the start and add +1 to it. This gives as the right value.
while moving in lesser than and greater than segments we must take care in passing the correct k value

Monday, 30 May 2011

Alternative partition method

Thursday, 26 May 2011

Random partition & its use-cases

“We studied lots of theories about random numbers and also how it gets purely random. But there are mathematical proofs that, randomized algorithms give good average performance. We are going to see Randomized partitioning which is the most important for a series of algorithms in sorting & selection”

http://analgorithmaday.blogspot.com/2011/02/partitioning-algorithmin-place.html

http://analgorithmaday.blogspot.com/2011/05/relation-between-binary-search-tree-and.html

Introduction

If you get a question like, “Find the k-th smallest element in an array”. What are the various methods you would follow,

sort the array with comparison algorithms using max O(n log n) and read the indexed element from the array in O(1)
sort the array using linear sorting techniques and get a O(n) with extra space. Read the indexed element from the array (special case)

Always you resort to sorting to find the k-th smallest element. And the second linear time algorithm to get the array sorted needs extra space.

What is the method you can use other than all this? Partitioning. When we do partitioning the array based on a pivot, the important output we get is that, the pivot element fits into its perfect position where it will be present in a sorted array. This happens for every pivot selection!! :)

This behaviour is really wonderful and can be used to get the k-th smallest element as well. But the observation is that, its not possible to get the final position of the pivot equal to a given value k. We need to do some iterations and select random pivots so that we get the pivot in our required position k. But by doing fewer iterations to find k is what really important here. Otherwise, we are again doing sorting. :)

Concept

Selecting random pivots? What does that mean? We have learned partition algorithms with a fixed pivot (the end element). How to randomize it? and how we can we guarantee that we can still get the k-th smallest element in O(n). Moreover, if we get the final pivot position equal to k at first attempt, we are very lucky!! :) definitely O(n).

There are many tricks with random partitioning of an array. Even this is used in randomly balanced binary trees. Remember the relation between quick sort & binary search tree ? :)

So, lets see first, how the random partitioning works. It’s just simple as generating a random pivot index and move it as the end element. Then, apply the normal partition algorithm we were doing already!!

But, what you get out of such a partition is wonderful!!.. If you are lucky

you get the k-th smallest or largest element
you get a perfectly balanced tree
you can avoid worst case O(n^2) in quick sort even if the array is sorted..

Mostly, it depends on the luck factor!! :) That’s the reason why, we use expectations to analyse these algorithms. The partitioning is the basic example to explain expectation based asymptotic notations. We avoid the log n extra performance delay in quick sort to select k-th element without sorting.

Code

int random_partition(int* arr, int size){    srand(time(NULL));    int pivotIdx = rand() % size;    int pivot = arr[pivotIdx];    int i=-1;    int j=0;    swap(arr[pivotIdx], arr[size-1]); // move pivot element to the end    pivotIdx = size-1;     while(j < size)    {        if(arr[j] <= pivot)        {            i = i+1;            swap(arr[i], arr[j]);        }        j++;    }     swap(arr[i+1], arr[pivotIdx]);    return i+1;} void main(){    int A[] = {2,5,7,1,10,8,9};    int loc = random_partition(A, 7);}

the loc you get is nothing but k. It gives a random k every time you execute the algorithm.
Usually, applications need a perfect k.. :) that’s where randomized selection algorithm comes into picture.

Monday, 9 May 2011

Concept behind Order Statistics

“How many times you would have required to get a kth smallest number in a set of numbers? It’s a must in many applications. Moreover, the most used is the minimum & maximum in a set of numbers. Although we have lots of O(1) data structures to get min/max, if we get numbers as an array!! we need to reorganize which still takes more time. This is where Order statistics comes into picture”

Introduction

We have seen lots of sorting methods, String analysis algorithms. As we see that the most common algorithmic methods are: sorting & searching. But there is one more, which is selection methods!!

Selecting what is max or min in a set is really simple and also the most used operation in any big programs.

Let’s see an example of a selecting problem in an interview perspective. One of the most asked question is:

Find the buying and selling price of the stock so that we maximize profit. We are given the array of stock prices. (with O(n) time and no extra space)

How would you solve this problem? There are many more related problems like selecting football team members based on the score profiles they have. In Cricket especially, the initial members considered by the board would be the players who have good economy and the computer basically prepares that list. :)

Even you can run an algorithm to find the players in the panel if you know all of their economies!!.

For this a layman way could be to use sorting and then select the top 10 or 20. But it’s not just sorting and elimination since all comparison based sorts take O(n log n) time.

What we need is O(n) time which is linear time.

So the main problem here is, How would you get k-th smallest or largest element in an array with O(n) linear time ?

Concept

What does it mean by k-th smallest element ?

If k=1, then it means the smallest element..

if k=n, then what we get would be the largest element in the array

This ‘k’ which can even decide the sorted order of the elements in the array is the base concept of Order Statistics.

Quick sort, Merge sort and almost all comparison sorts tries to find this ‘k’ of each element in the array so that we can place the elements in the right positions.

We also studied about O(n) sorting methods(counting sort, radix sort) which tries to find this ‘k’ without comparisons. :) Similar to those, we need a selecting method, if our requirement is to just find the k-th position code.

First we will see basic code of finding max & min with lesser constant time.

Code

void getRange(int *arr, int size){    int min=INT_MAX;    int max=INT_MIN;    for(int i=0; i < size; i++)    {        if(arr[i] < min)            min = arr[i];         if(arr[i] > max)            max = arr[i];    }    cout<<min<<","<<max;}  void main(){    int A[] = {2,5,1,8,9,13,4,14,6};    getRange(A, 9);}

The above code takes O(n) time and n comparisons as well to determine min & max
This can be made faster!! Yes!! Avoiding constant time in a code really optimizes the code!!.. Always try remove constant time when you want optimize code!! :)
The optimizations possible
- Setting up initial values of max & min don’t need to at extremes
- If the size is odd, set min & max to A[0]
- If the size is even, compare first 2 elements, minimum goes to min & maximum goes to max
- The above kind of initialization reduces comparisons to n-1.
- But still we need both max & min here, so we still have 2n-2 comparisons (from n comparisons).
- next optimization is: comparing in pairs!! which would bring down our comparisons to 3(n/2) :).. Below code does all these optimizations

void getRange(int *arr, int size){    int min;    int max;    int start=0;    if((size & 1) == 1) {        min = arr[0];         max = arr[0];        start = 1;    }    else {        if(arr[0] < arr[1]) {            min = arr[0];            max = arr[1];        }        else {            min = arr[1];            max = arr[0];        }        start = 2;    }    for(int i = start; i < size; i+=2)    {        int smaller;        int larger;        if(arr[i] < arr[i+1]) {            smaller = arr[i];            larger = arr[i+1];        }        else {            smaller = arr[i+1];            larger = arr[i];        }         if(min > smaller)             min = smaller;         if(max < larger)            max = larger;    }    cout<<min<<","<<max;}  void main(){    int A[] = {2,5,8,9,13,4,14,6};    getRange(A, 8);}

Although the above code looks bigger, the comparisons are lesser and our problem domain got divided. n/2 :)
We have some how divided the problem /2 using optimization. But more cleaner optimization possible if we know the Median!!
We will learn about Median in next article!!

MIT Video

MIT order statistics lecture

The above video explains all algorithms in selection method. We will see all these methods in coming blog entries.

Tuesday, 31 May 2011

Randomized Selection algorithm

Monday, 30 May 2011

Alternative partition method

Thursday, 26 May 2011

Random partition & its use-cases

Monday, 9 May 2011

Concept behind Order Statistics

About the Author