An Algorithm a Day: probability

Showing posts with label probability. Show all posts

Tuesday, 31 May 2011

Randomized Selection algorithm

“Find the k-th smallest element in an array without sorting.. This is the algorithm which does that.”

Friday, 27 May 2011

Randomized Quick sort

“Randomized quick sort uses random partition method we discussed. We just select a random pivot in an array. The advantage we get is that we can reduce the worst case performance of quick sort!! But still, this optimization is expected!!”

Introduction

As we have seen a lot about this already, we can directly jump into Randomized quick sort

Concept

Just select a random pivot element in the partition algorithm.

Code

 int random_partition(int* arr, int start, int end){    srand(time(NULL));    int pivotIdx = start + rand() % (end-start+1);    int pivot = arr[pivotIdx];    swap(arr[pivotIdx], arr[end]); // move pivot element to the end    pivotIdx = end;    int i = start -1;     for(int j=start; j<=end-1; j++)    {        if(arr[j] <= pivot)        {            i = i+1;            swap(arr[i], arr[j]);        }    }     swap(arr[i+1], arr[pivotIdx]);    return i+1;} void random_quick_sort(int* arr, int start, int end){    if(start < end) {        int mid = random_partition(arr, start, end);        random_quick_sort(arr, start, mid-1);        random_quick_sort(arr, mid+1, end);    }}void main(){    int A[] = {2,5,7,1,10,8,9};    random_quick_sort(A, 0,6);}

pivot Index must be start + rand() %(end-start+1).. We do a recursive algorithm with different start and end.. :) so be careful about the formula for pivotIndex
We need to add start to the rand() % end because we need to get near the size of the array
start->mid-1 and mid+1->end.. is enough for quick sort.. NOTE: we always get the pivot element in right position at every iteration
There is an alternate way of partitioning given at many other places.. Not a CLRS way!! :) We will see that as well in the next article..!! :D

Monday, 23 May 2011

Randomize the entries of sorted array

“We have seen about randomization and probability. But the application of these math concepts are mainly with randomized algorithms!!.. We have also seen asymptotic notations for normal algorithms.. but for randomized, estimating what will happen is probabilistic!! “

http://analgorithmaday.blogspot.com/2011/05/probability-for-dummies.html

Introduction

Algorithms which are non-deterministic but probabilistic are called randomized algorithms. If its a polynomial algorithms, then it starts from O(1) and goes up to O(n^m). If the algorithm uses divide and conquer and divides the problem into 2^step times, we get a logarithmic timed algorithms. If the algorithm uses random number generators and solves a problem in pure random, these are randomized algorithms for which we need to use expectations to solve.

Expectations mean: expecting that some event happens always. We have seen an example like how much possible that always we get heads in every toss!! To find this, we use expectations and expectations are easily represented using indicator random variable, I

I = {0 or 1} 1 = means event occurs, 0= means event does not occur.

This indicator variable is the basis to calculate the expectation of an event. For example,

X_H: is the event that head always occurs.

then, X_H = I{ always getting head }

So, the expectation of X_H = E( I{always getting head})

= 1 * Probability of getting heads + 0 * Probability of getting tails

= 1. (1/2) + 0.(1/2) = 1/2

Which means, by tossing the coin once, the expectation of getting head is 50%. This means even if you do 50 flips, the probability totals to 50% chance.

Concept

The above concept about expectations are the basis to all randomized algorithms. We need to expect some value for which the algorithm will perform at its maximum to get the best case performance. We need to expect for the failure case to get the worst case performance.

In this article, we will consider about randomizing the entries in an sorted array. As we already learnt about the concept behind randomness. There are two types: pseudo random number, perfectly random number. For a perfect random number we need a uniform random distribution which depends on the seed value and the constant factors(which should be prime always).

But pseudo random number generators are faster and almost gives a pseudo random which is suited for most of the applications. But we must ignore duplicates if we want only unique random numbers!! :). As we told its random, even repetition means random.

To get true randomness, we have to select the seed value perfectly. As we have already studied, mostly seed value used would be time.

Really good tutorial along with code is present here: http://www.learncpp.com/cpp-tutorial/59-random-number-generation/

Let me reiterate about, the need of seed. Seed is nothing but the multiplier for generating random number. Usually, random number runs from 0 to RAND_MAX in C++. In all languages, you have a random maximum (usually the size of unsigned integer). The seed and the constant factor mod with the range you want gives the pseudo random value required.

Code

To get random number between two ranges,

int getRandom(int start, int end){    srand(time(NULL));    return (rand() % (end-start+1))+start;}

To get unique random numbers,

#include <bitset>int getRandom(int start, int end){    srand(time(NULL));    return (rand() % (end-start+1))+start;} void main(){    int numbers[6];    int checker=0;    bitset<6> myset;    int i=0;     do {        int num = getRandom(0,5);        if((checker & (1<<num)) == 0)         {             numbers[i] = num;            i++;        }        checker |= 1 << num;        myset = checker;    }while(myset.count() != 6);}

I have played a bit with bitwise. :) But all these are concepts from http://analgorithmaday.blogspot.com/2011/04/unique-items-in-array.html
Note that, the size of the random number domain, needs to be perfect. Only then checker will get all 1’s in its array. Be careful with bits!! :)
For a normal case of finding duplicates or for random numbers getting unique, both has the most important problem. Finding when the loop ends with all the unique items extracted!! you can use either i’s count or bitset’s count function.
The algorithm is written for reducing the use of extra space!!.. But bitset uses extra space anyway. :D more simpler would be just track the value of i. It doesn’t need any extra space ;)
For our case, i have made use of bitset. Its available in Java. In C, GCC provides an intrinsic __builtin_popcount
Refer here for more details on how bitset works: http://en.wikipedia.org/wiki/Hamming_weight

Final code to randomize sorted array

int getRandom(int start, int end){    srand(time(NULL));    return (rand() % (end-start+1))+start;}  int* getUniqueNums(int limit){    int *numbers = (int*) malloc(sizeof(int)*limit);    int checker=0;    bool *myset = (bool*) calloc(limit, sizeof(bool));    int i=0;     do {        int num = getRandom(0,limit-1);        if(!myset[num])         {             numbers[i] = num;            myset[num] = true;            i++;        }    }while(i<limit);     return numbers;} void main(){  int a[] = {1,2,3,4,5,6};  int *b = getUniqueNums(6);   // use b array as priority array into a  for(int i=0;i<6;i++) {      b[i] = a[b[i]];  }}

In the above example, we have avoided usage of bitset. Studying C++ bitset is useful btw. :) UPDATE: we don’t need that extra space, we can just count on the i. But in above code, we are not using checkers, so we need myset bool array!!
Analysing the performance of this code will use Expectations/Probability as you can note that, the time taken to run getUniqueNums is completely random :)
As we have based the seed over system time, the expectation is also on system time. But we have to expect completion of the algorithm getUniqueNums in O(limit) in best case.
We have completely ignored other popular randomized algorithms like Randomized quick sort and randomized selection algorithm. Which we will see in coming days.

Thursday, 19 May 2011

Probability for Dummies

"The various possible ways are decided using permutation or combination. But what is the possibility that a arrangement can occur is called probability.. Randomness in this world requires probability to predict..!! All game plays are made with some defined principles and randomness..But how to measure at least well defined randomness so that to predict something? plainly with probability"

See Also
http://analgorithmaday.blogspot.com/2011/05/set-theorya-dummies-guide.html
http://analgorithmaday.blogspot.com/2011/05/permutations-combinations-for-dummies.html

Introduction

In mathematics, you have deterministic or non-deterministic problems. All non-deterministic problems are called probabilistic problems. Simply, All math done with formula's are deterministic..!! Even for generating randomness, we use formula's. But note that these are non-deterministic formulas!!.. You can never say when they will fail!!.. a formula should never fail right ? :)

The beauty of probability is to check how much we can pass/fail rather than simply pass!!.. It is sometimes just a condition over permutations/combination. That is, reason we studied about permutation & set theory well before.

Pass/Fail means Probability.. Which means.. Pass = 1, Fail = 0. So the value of probability will lie between 0 and 1.

If probability of some event is 0, then it means it will never occur (utter fail or flop). If its 1, then it will definitely occur(success).

What is the event a probability measure? By now, you would have guessed. Random and known time interval occurring events. But we need to put some question for which we need to calculate probability (it can be either % failure or % success calculation).

The common example, flipping coin. The probability of getting tail 50%. The probability of getting head is 50%. Because the event is equally likely. You will either get a tail or head with equal probability.

But not all events are equally likely like this. Lets take an example question like this.
A couple has 2 kids and atleast one is a boy. What is the probability that both are boys?
Get the permutations of 2 kids: GG, BG, GB, BB.. 4 arrangements. Since we only care about probability of getting 2 boys, Ignore GG. Total 3 arrangements in which we find a boy out of which only 1 has both of them as boys.
So the probability is: 1/3

Concepts (Math)
Some more math about Probability Point-by-Point.

Probability values are always between 0 and 1. So, it also like 0% possible, 100% possible
It operates over a sample space called Experiment(E). It's nothing but the set of all the permutations. The set of permutation changes based on the kind of probability question we ask!!
We call some thing as event if it occurs once in the experiment. Event is nothing but subset in the Experiment set. We calculate the probability of an Event in the experiment!!
All set theory concepts applies to our experiment set(sample space). You can combine two experiments (Union). You can take common events across Experiments(Intersection). You can even take difference between two experiments.
Disjoint sets as we already studied means empty set when you take intersection between them. Same case applies here, disjoint events means no relation between the two events.
If you have two events and you know that always 1 event occurs, then we can use conditional probability. Conditional probability is nothing but taking intersection between the two events. (take randomness of only common events)

Some more concepts
Sampling is a statistical concepts. Probability too works with samples. There are many more mathematics part of statistics like differential equations and integrals. Distributions are all defined with differentials and integrals. Distributions are basically used to generalize the function. The same distributions can be used to generalize a probability function as a random variable. This is called Probability distributions.

Whatever probability you have, the probability distribution runs from 0 to 1 and also there will be a mean value between these two numbers.
Example: If there are large number of trails of tossing the coin, then heads will be the max with probability around 50%.

Where it is used ?
Probability and expected value function is a must to know to analyze the random algorithms. We studied all this just because we are going to study randomized algorithms :).

It is known that the randomized algorithms perform really well in average case for specific problems.

More about Probability
I would suggest referring a better math book rather than me writing about it. There are many more formulas and most used probability functions. Please refer a proper book.

Friday, 13 May 2011

Permutations & Combinations for Dummies

“I never understood concepts like permutation, combination which is quite confusing. But permutation is nothing but just a way to count all possibilities of a nested iteration!!”

Introduction

Just take, the following nested for loops

for(int i=0;i<m; i++) { for(int j=0; j<n;j++) {   cout<<i<<":"<<j<<endl; }}

Simply we can tell directly the total iterations are m*n. But the traversal done by the loop with an array will be like..

REPEAT –> read the row –> read from left to right –> move to next row –> UNTIL i*j == m*n

These are the various values comes out considering m=3, n=3,

{0,0}, {0,1}, {0,2}

{1,0}, {1,1}, {1,2}

{2,0}, {2,1}, {2,2}

Keeping i, we iterate with changing symbols in j.. Doesn’t it look like Cartesian Product with a Condition?

This kind of rearrangement of numbers are necessary to traverse the array. Similarly you need rearrangement of numbers in various applications!!..

For example, say, you need to find all possible rearrangements of a shuffled 56 deck card.. There are numerous ways if you need to find the rearrangements of all the 56 cards(56!).. But if you want to know rearrangements possible when the cards are distributed within 14 people..

You just need to count 14 possible rearrangements out of 56 cards to calculate. This is where you need permutation..

To understand permutation well, you need to understand set theory!! Especially Cartesian Product!.. Learn about set theory first: http://analgorithmaday.blogspot.com/2011/05/set-theorya-dummies-guide.html

Concept

Every multiplication can be represented with a set of additions. When you add a number that much times, you get the multiplied value. Similarly, if you have a set of 3 numbers.. how would need to find the number of ways it can be rearranged?

Strings

If you take Cartesian product of same set k times with each set having some n symbols in it, you get strings. S^k sets with each having n items.

i.e, If the set is S, S x S x S is a 3-set product. If you have only 2 elements in it, then total you get 2^3 distinct elements as a result of this product. This count of distinct elements can be got from n^k equation. Example, Take a binary string of length 3. You have max 2 symbols possible in a binary. Total possible sets are,

000,001,010,011,100,101,110,111

Which is 2^3.

Permutation

Permuting means, simply re-arranging without missing any symbols!!.. if you want the number of rearrangements what would you do ? That’s where permutation comes into picture.

For example, saw you have a set {a,b,c}. The total number of ways, you can order this variables are,

abc, acb, bca, bac, cab,cba

Total 6 ways, the above 3 char array can be permuted. This gives as a pattern, the first element is chosen n ways, the next ignoring the first element chosen in n-1 ways, the next in n-2 ways. This means, you get n(n-1)(n-2) which is nothing but n!

If in the same above array, you just need to know the permutation of only k-elements instead of all n elements. (n is the total elements in the array). ex: this is similar to finding the 14 cards possible rearrangements in 56 card deck..

then, the total ways are,

ab, ac, bc, ba, ca, cb

Again the same, 6 ways. But now, n, n-1, n-2 and upto n-k+1 is what required. So, The formula becomes like below,

Combination

We already saw about Strings which we get by Cartesian product of the same set and get the unique items out of the set with each symbol in an order.

Combination has the same property of getting unique items with each item at one place but ignores the order of items requirement. It considers symbols in both the set and takes out in-order uniquely!!.. i.e, You should remove transitive, symmetric symbols in a combination!!

Ex: Permutation for 2 element in a 4 element set, {a,b,c,d}

ab,ac,ad,ba,bc,bd, ca,cb,cd,da,db,dc

Total 12 sets are possible since we need to consider every re-arrangement in any order.

Combination for 2 element in a 4 element set, {a,b,c,d}

ab, ac, ad, bc,bd, cd

There are only 6 sets!!.

Formula:

Examples

Some real world example!!

http://betterexplained.com/articles/easy-permutations-and-combinations/

You have 8 players.. and 3 different cups to be won. Gold, Silver, Bronze. You give each cup to only one of the players and the cups will be given in the order, Gold->Silver->Bronze.

Gold: When you need to give Gold, you have all 8 players competing for it. So, total 8 options possible.

Silver: When you need to give Silver, you have already given Gold, so only 7 options possible

Bronze: When you need to give Bronze, you have already given Silver, so only 6 options possible.

Total ways in which the 8 players can win will be 8! ways..

But, there are only 3 cups, so Total ways in which the top 3 cups can be won, if given one-by-one in order to the persons will be: 8 * 7 * 6.

This is where Permutation comes into Picture!

If the price money is same, i.e, a Stainless steel plate. You don’t need to care about the order in which the 3 cups are given. So, its 3 cups given to 8 players. But 3 cups re-arrangement don’t need to be taken into account as all are equal. So 3! ways of arranging the cups(6 ways) are useless when it comes to just picking some 3 out-of-8.

This is combination and you get only: 8*7 ways according to the formula.

Tuesday, 31 May 2011

Randomized Selection algorithm

Friday, 27 May 2011

Randomized Quick sort

Monday, 23 May 2011

Randomize the entries of sorted array

Thursday, 19 May 2011

Probability for Dummies

Friday, 13 May 2011

Permutations & Combinations for Dummies

About the Author