Tuesday, 31 May 2011

Randomized Selection algorithm

“Find the k-th smallest element in an array without sorting.. This is the algorithm which does that.”

See Also

http://analgorithmaday.blogspot.com/2011/05/concept-behind-order-statistics_09.html

Introduction

  We have already seen lots of information about order statistics. Finding just the k-th smallest element in an array doesn’t require sorting but you just need to find out where the k-th smallest element will fit in.

  Usual way would be, to find the max value using a variable. But what would you do, if you want to find the second max? You would maintain 2 variables. Similarly, its possible to maintain a stack of current minimums for each of the 2 elements in the array in O(n) complexity with space O(n). :) We will see about this problem in next article and also its an interview question in some companies. :D

  But the right way to find the k-th minimum element is randomized selection algorithm. We have already learned about the randomized partition usage. We can use the same in randomized selection itself.

Concept

  Do randomized partition. Get the location of the pivot which is also random. After you get the location, just subtract the location from the start of the array. This is nothing but the k-th position in the sorted array.

     Example,

              arr = 9,5,7,1,10,2,3

        if the randompivot selected is 5, and the partition has happened then, the resulting array will be,

       arr =    1 2 3 5 9 7 10

The current location returned will be 3 (array starting from zero).

  We assumed above array is starting from zero. :) But the actual formula to calculate the k position will be,

   position – start +1, we need this +1 to get the exact location.

If the above value is equal to the given k, we have got the best case here. Just O(n) to find our required smallest element.

For example, say the user wants the 3rd smallest element. We already got that. :) we don’t need to continue anymore.

   But important thing here is how our random generator works and what we expect from it. If the pivot selected always helps us out then we get the best case.

  We continue this randomized pivot selection based on the first value and continue recursively on either side.

  1  2 3   5     9 7 10

         start      mid        end

now, k = mid –start +1 = 4

Say, we need 6th smallest(7). we have got only the 4th smallest. since 4 is lesser than 6, we will have our 6th smallest in the left side. i.e,

within, 9 7 10

This is, mid+1 to end

But the k-th element we need to find inside this array is just the 2nd smallest!! :D

So, we need to do 6-4, that is, k – (given value) to be passed to this recursion

similarly, we need to cover for the greater case as well.

Code

int random_partition(int* arr, int start, int end)
{
    srand(time(NULL));
    int pivotIdx = start + rand() % (end-start+1);
    int pivot = arr[pivotIdx];
    swap(arr[pivotIdx], arr[end]); // move pivot element to the end
    pivotIdx = end;
    int i = start -1;
 
    for(int j=start; j<=end-1; j++)
    {
        if(arr[j] <= pivot)
        {
            i = i+1;
            swap(arr[i], arr[j]);
        }
    }
 
    swap(arr[i+1], arr[pivotIdx]);
    return i+1;
}
 
int random_selection(int* arr, int start, int end, int k)
{
    if(start == end)
        return arr[start];
 
    if(k ==0) return -1;
 
    if(start < end)
    {
 
    int mid = random_partition(arr, start, end);
    i = mid - start + 1;
    if(i == k)
        return arr[mid];
    else if(k < i)
        return random_selection(arr, start, mid-1, k);
    else 
        return random_selection(arr, mid+1, end, k-i);
    }
 
}
void main()
{
    int A[] = {9,5,7,1,10,2,3};
    int loc = random_selection(A, 0,6,5);
}
  • Note that the k argument we pass is not starting from 0.. it starts from 1
  • Even though our random partition never returns greater than 6. We calculate the distance from the start and add +1 to it. This gives as the right value.
  • while moving in lesser than and greater than segments  we must take care in passing the correct k value