An Algorithm a Day: expectations

Showing posts with label expectations. Show all posts

Tuesday, 31 May 2011

Randomized Selection algorithm

“Find the k-th smallest element in an array without sorting.. This is the algorithm which does that.”

Introduction

We have already seen lots of information about order statistics. Finding just the k-th smallest element in an array doesn’t require sorting but you just need to find out where the k-th smallest element will fit in.

Usual way would be, to find the max value using a variable. But what would you do, if you want to find the second max? You would maintain 2 variables. Similarly, its possible to maintain a stack of current minimums for each of the 2 elements in the array in O(n) complexity with space O(n). :) We will see about this problem in next article and also its an interview question in some companies. :D

But the right way to find the k-th minimum element is randomized selection algorithm. We have already learned about the randomized partition usage. We can use the same in randomized selection itself.

Concept

Do randomized partition. Get the location of the pivot which is also random. After you get the location, just subtract the location from the start of the array. This is nothing but the k-th position in the sorted array.

Example,

arr = 9,5,7,1,10,2,3

if the randompivot selected is 5, and the partition has happened then, the resulting array will be,

arr = 1 2 3 5 9 7 10

The current location returned will be 3 (array starting from zero).

We assumed above array is starting from zero. :) But the actual formula to calculate the k position will be,

position – start +1, we need this +1 to get the exact location.

If the above value is equal to the given k, we have got the best case here. Just O(n) to find our required smallest element.

For example, say the user wants the 3rd smallest element. We already got that. :) we don’t need to continue anymore.

But important thing here is how our random generator works and what we expect from it. If the pivot selected always helps us out then we get the best case.

We continue this randomized pivot selection based on the first value and continue recursively on either side.

1 2 3 5 9 7 10

start mid end

now, k = mid –start +1 = 4

Say, we need 6th smallest(7). we have got only the 4th smallest. since 4 is lesser than 6, we will have our 6th smallest in the left side. i.e,

within, 9 7 10

This is, mid+1 to end

But the k-th element we need to find inside this array is just the 2nd smallest!! :D

So, we need to do 6-4, that is, k – (given value) to be passed to this recursion

similarly, we need to cover for the greater case as well.

Code

int random_partition(int* arr, int start, int end){    srand(time(NULL));    int pivotIdx = start + rand() % (end-start+1);    int pivot = arr[pivotIdx];    swap(arr[pivotIdx], arr[end]); // move pivot element to the end    pivotIdx = end;    int i = start -1;     for(int j=start; j<=end-1; j++)    {        if(arr[j] <= pivot)        {            i = i+1;            swap(arr[i], arr[j]);        }    }     swap(arr[i+1], arr[pivotIdx]);    return i+1;} int random_selection(int* arr, int start, int end, int k){    if(start == end)        return arr[start];     if(k ==0) return -1;     if(start < end)    {     int mid = random_partition(arr, start, end);    i = mid - start + 1;    if(i == k)        return arr[mid];    else if(k < i)        return random_selection(arr, start, mid-1, k);    else         return random_selection(arr, mid+1, end, k-i);    } }void main(){    int A[] = {9,5,7,1,10,2,3};    int loc = random_selection(A, 0,6,5);}

Note that the k argument we pass is not starting from 0.. it starts from 1
Even though our random partition never returns greater than 6. We calculate the distance from the start and add +1 to it. This gives as the right value.
while moving in lesser than and greater than segments we must take care in passing the correct k value

Monday, 23 May 2011

Randomize the entries of sorted array

“We have seen about randomization and probability. But the application of these math concepts are mainly with randomized algorithms!!.. We have also seen asymptotic notations for normal algorithms.. but for randomized, estimating what will happen is probabilistic!! “

http://analgorithmaday.blogspot.com/2011/05/probability-for-dummies.html

Introduction

Algorithms which are non-deterministic but probabilistic are called randomized algorithms. If its a polynomial algorithms, then it starts from O(1) and goes up to O(n^m). If the algorithm uses divide and conquer and divides the problem into 2^step times, we get a logarithmic timed algorithms. If the algorithm uses random number generators and solves a problem in pure random, these are randomized algorithms for which we need to use expectations to solve.

Expectations mean: expecting that some event happens always. We have seen an example like how much possible that always we get heads in every toss!! To find this, we use expectations and expectations are easily represented using indicator random variable, I

I = {0 or 1} 1 = means event occurs, 0= means event does not occur.

This indicator variable is the basis to calculate the expectation of an event. For example,

X_H: is the event that head always occurs.

then, X_H = I{ always getting head }

So, the expectation of X_H = E( I{always getting head})

= 1 * Probability of getting heads + 0 * Probability of getting tails

= 1. (1/2) + 0.(1/2) = 1/2

Which means, by tossing the coin once, the expectation of getting head is 50%. This means even if you do 50 flips, the probability totals to 50% chance.

Concept

The above concept about expectations are the basis to all randomized algorithms. We need to expect some value for which the algorithm will perform at its maximum to get the best case performance. We need to expect for the failure case to get the worst case performance.

In this article, we will consider about randomizing the entries in an sorted array. As we already learnt about the concept behind randomness. There are two types: pseudo random number, perfectly random number. For a perfect random number we need a uniform random distribution which depends on the seed value and the constant factors(which should be prime always).

But pseudo random number generators are faster and almost gives a pseudo random which is suited for most of the applications. But we must ignore duplicates if we want only unique random numbers!! :). As we told its random, even repetition means random.

To get true randomness, we have to select the seed value perfectly. As we have already studied, mostly seed value used would be time.

Really good tutorial along with code is present here: http://www.learncpp.com/cpp-tutorial/59-random-number-generation/

Let me reiterate about, the need of seed. Seed is nothing but the multiplier for generating random number. Usually, random number runs from 0 to RAND_MAX in C++. In all languages, you have a random maximum (usually the size of unsigned integer). The seed and the constant factor mod with the range you want gives the pseudo random value required.

Code

To get random number between two ranges,

int getRandom(int start, int end){    srand(time(NULL));    return (rand() % (end-start+1))+start;}

To get unique random numbers,

#include <bitset>int getRandom(int start, int end){    srand(time(NULL));    return (rand() % (end-start+1))+start;} void main(){    int numbers[6];    int checker=0;    bitset<6> myset;    int i=0;     do {        int num = getRandom(0,5);        if((checker & (1<<num)) == 0)         {             numbers[i] = num;            i++;        }        checker |= 1 << num;        myset = checker;    }while(myset.count() != 6);}

I have played a bit with bitwise. :) But all these are concepts from http://analgorithmaday.blogspot.com/2011/04/unique-items-in-array.html
Note that, the size of the random number domain, needs to be perfect. Only then checker will get all 1’s in its array. Be careful with bits!! :)
For a normal case of finding duplicates or for random numbers getting unique, both has the most important problem. Finding when the loop ends with all the unique items extracted!! you can use either i’s count or bitset’s count function.
The algorithm is written for reducing the use of extra space!!.. But bitset uses extra space anyway. :D more simpler would be just track the value of i. It doesn’t need any extra space ;)
For our case, i have made use of bitset. Its available in Java. In C, GCC provides an intrinsic __builtin_popcount
Refer here for more details on how bitset works: http://en.wikipedia.org/wiki/Hamming_weight

Final code to randomize sorted array

int getRandom(int start, int end){    srand(time(NULL));    return (rand() % (end-start+1))+start;}  int* getUniqueNums(int limit){    int *numbers = (int*) malloc(sizeof(int)*limit);    int checker=0;    bool *myset = (bool*) calloc(limit, sizeof(bool));    int i=0;     do {        int num = getRandom(0,limit-1);        if(!myset[num])         {             numbers[i] = num;            myset[num] = true;            i++;        }    }while(i<limit);     return numbers;} void main(){  int a[] = {1,2,3,4,5,6};  int *b = getUniqueNums(6);   // use b array as priority array into a  for(int i=0;i<6;i++) {      b[i] = a[b[i]];  }}

In the above example, we have avoided usage of bitset. Studying C++ bitset is useful btw. :) UPDATE: we don’t need that extra space, we can just count on the i. But in above code, we are not using checkers, so we need myset bool array!!
Analysing the performance of this code will use Expectations/Probability as you can note that, the time taken to run getUniqueNums is completely random :)
As we have based the seed over system time, the expectation is also on system time. But we have to expect completion of the algorithm getUniqueNums in O(limit) in best case.
We have completely ignored other popular randomized algorithms like Randomized quick sort and randomized selection algorithm. Which we will see in coming days.

Tuesday, 31 May 2011

Randomized Selection algorithm

Monday, 23 May 2011

Randomize the entries of sorted array

About the Author