Thursday, 26 May 2011

Random partition & its use-cases

“We studied lots of theories about random numbers and also how it gets purely random. But there are mathematical proofs that, randomized algorithms give good average performance. We are going to see Randomized partitioning which is the most important for a series of algorithms in sorting & selection”

See also

http://analgorithmaday.blogspot.com/2011/05/concept-behind-order-statistics_09.html

http://analgorithmaday.blogspot.com/2011/02/partitioning-algorithmin-place.html

http://analgorithmaday.blogspot.com/2011/05/relation-between-binary-search-tree-and.html

Introduction

   If you get a question like, “Find the k-th smallest element in an array”. What are the various methods you would follow,

  • sort the array with comparison algorithms using max O(n log n) and read the indexed element from the array in O(1)
  • sort the array using linear sorting techniques and get a O(n) with extra space. Read the indexed element from the array (special case)

Always you resort to sorting to find the k-th smallest element. And the second linear time algorithm to get the array sorted needs extra space.

What is the method you can use other than all this? Partitioning. When we do partitioning the array based on a pivot, the important output we get is that, the pivot element fits into its perfect position where it will be present in a sorted array. This happens for every pivot selection!! :)

  This behaviour is really wonderful and can be used to get the k-th smallest element as well. But the observation is that, its not possible to get the final position of the pivot equal to a given value k. We need to do some iterations and select random pivots so that we get the pivot in our required position k. But by doing fewer iterations to find k is what really important here. Otherwise, we are again doing sorting. :)

Concept

   Selecting random pivots? What does that mean? We have learned partition algorithms with a fixed pivot (the end element). How to randomize it? and how we can we guarantee that we can still get the k-th smallest element in O(n). Moreover, if we get the final pivot position equal to k at first attempt, we are very lucky!! :) definitely O(n).

   There are many tricks with random partitioning of an array. Even this is used in randomly balanced binary trees. Remember the relation between quick sort & binary search tree ? :)

    So, lets see first, how the random partitioning works. It’s just simple as generating a random pivot index and move it as the end element. Then, apply the normal partition algorithm we were doing already!!

    But, what you get out of such a partition is wonderful!!.. If you are lucky

  • you get the k-th smallest or largest element
  • you get a perfectly balanced tree
  • you can avoid worst case O(n^2) in quick sort even if the array is sorted..

Mostly, it depends on the luck factor!! :) That’s the reason why, we use expectations to analyse these algorithms. The partitioning is the basic example to explain expectation based asymptotic notations. We avoid the log n extra performance delay in quick sort to select k-th element without sorting.

Code

int random_partition(int* arr, int size)
{
    srand(time(NULL));
    int pivotIdx = rand() % size;
    int pivot = arr[pivotIdx];
    int i=-1;
    int j=0;
    swap(arr[pivotIdx], arr[size-1]); // move pivot element to the end
    pivotIdx = size-1;
 
    while(j < size)
    {
        if(arr[j] <= pivot)
        {
            i = i+1;
            swap(arr[i], arr[j]);
        }
        j++;
    }
 
    swap(arr[i+1], arr[pivotIdx]);
    return i+1;
}
 
void main()
{
    int A[] = {2,5,7,1,10,8,9};
    int loc = random_partition(A, 7);
}
  • the loc you get is nothing but k. It gives a random k every time you execute the algorithm.
  • Usually, applications need a perfect k.. :) that’s where randomized selection algorithm comes into picture.