Showing posts with label probability. Show all posts
Showing posts with label probability. Show all posts

Tuesday, 31 May 2011

Randomized Selection algorithm

“Find the k-th smallest element in an array without sorting.. This is the algorithm which does that.”

See Also

http://analgorithmaday.blogspot.com/2011/05/concept-behind-order-statistics_09.html

Introduction

  We have already seen lots of information about order statistics. Finding just the k-th smallest element in an array doesn’t require sorting but you just need to find out where the k-th smallest element will fit in.

  Usual way would be, to find the max value using a variable. But what would you do, if you want to find the second max? You would maintain 2 variables. Similarly, its possible to maintain a stack of current minimums for each of the 2 elements in the array in O(n) complexity with space O(n). :) We will see about this problem in next article and also its an interview question in some companies. :D

  But the right way to find the k-th minimum element is randomized selection algorithm. We have already learned about the randomized partition usage. We can use the same in randomized selection itself.

Concept

  Do randomized partition. Get the location of the pivot which is also random. After you get the location, just subtract the location from the start of the array. This is nothing but the k-th position in the sorted array.

     Example,

              arr = 9,5,7,1,10,2,3

        if the randompivot selected is 5, and the partition has happened then, the resulting array will be,

       arr =    1 2 3 5 9 7 10

The current location returned will be 3 (array starting from zero).

  We assumed above array is starting from zero. :) But the actual formula to calculate the k position will be,

   position – start +1, we need this +1 to get the exact location.

If the above value is equal to the given k, we have got the best case here. Just O(n) to find our required smallest element.

For example, say the user wants the 3rd smallest element. We already got that. :) we don’t need to continue anymore.

   But important thing here is how our random generator works and what we expect from it. If the pivot selected always helps us out then we get the best case.

  We continue this randomized pivot selection based on the first value and continue recursively on either side.

  1  2 3   5     9 7 10

         start      mid        end

now, k = mid –start +1 = 4

Say, we need 6th smallest(7). we have got only the 4th smallest. since 4 is lesser than 6, we will have our 6th smallest in the left side. i.e,

within, 9 7 10

This is, mid+1 to end

But the k-th element we need to find inside this array is just the 2nd smallest!! :D

So, we need to do 6-4, that is, k – (given value) to be passed to this recursion

similarly, we need to cover for the greater case as well.

Code

int random_partition(int* arr, int start, int end)
{
    srand(time(NULL));
    int pivotIdx = start + rand() % (end-start+1);
    int pivot = arr[pivotIdx];
    swap(arr[pivotIdx], arr[end]); // move pivot element to the end
    pivotIdx = end;
    int i = start -1;
 
    for(int j=start; j<=end-1; j++)
    {
        if(arr[j] <= pivot)
        {
            i = i+1;
            swap(arr[i], arr[j]);
        }
    }
 
    swap(arr[i+1], arr[pivotIdx]);
    return i+1;
}
 
int random_selection(int* arr, int start, int end, int k)
{
    if(start == end)
        return arr[start];
 
    if(k ==0) return -1;
 
    if(start < end)
    {
 
    int mid = random_partition(arr, start, end);
    i = mid - start + 1;
    if(i == k)
        return arr[mid];
    else if(k < i)
        return random_selection(arr, start, mid-1, k);
    else 
        return random_selection(arr, mid+1, end, k-i);
    }
 
}
void main()
{
    int A[] = {9,5,7,1,10,2,3};
    int loc = random_selection(A, 0,6,5);
}
  • Note that the k argument we pass is not starting from 0.. it starts from 1
  • Even though our random partition never returns greater than 6. We calculate the distance from the start and add +1 to it. This gives as the right value.
  • while moving in lesser than and greater than segments  we must take care in passing the correct k value

Friday, 27 May 2011

Randomized Quick sort

“Randomized quick sort uses random partition method we discussed. We just select a random pivot in an array. The advantage we get is that we can reduce the worst case performance of quick sort!! But still, this optimization is expected!!”

Introduction

   As we have seen a lot about this already, we can directly jump into Randomized quick sort

Concept

  Just select a random pivot element in the partition algorithm.

Code

 
int random_partition(int* arr, int start, int end)
{
    srand(time(NULL));
    int pivotIdx = start + rand() % (end-start+1);
    int pivot = arr[pivotIdx];
    swap(arr[pivotIdx], arr[end]); // move pivot element to the end
    pivotIdx = end;
    int i = start -1;
 
    for(int j=start; j<=end-1; j++)
    {
        if(arr[j] <= pivot)
        {
            i = i+1;
            swap(arr[i], arr[j]);
        }
    }
 
    swap(arr[i+1], arr[pivotIdx]);
    return i+1;
}
 
void random_quick_sort(int* arr, int start, int end)
{
    if(start < end) {
        int mid = random_partition(arr, start, end);
        random_quick_sort(arr, start, mid-1);
        random_quick_sort(arr, mid+1, end);
    }
}
void main()
{
    int A[] = {2,5,7,1,10,8,9};
    random_quick_sort(A, 0,6);
}
  • pivot Index must be start + rand() %(end-start+1).. We do a recursive algorithm with different start and end.. :) so be careful about the formula for pivotIndex
  • We need to add start to the rand() % end because we need to get near the size of the array
  • start->mid-1 and mid+1->end.. is enough for quick sort.. NOTE: we always get the pivot element in right position at every iteration
  • There is an alternate way of partitioning given at many other places.. Not a CLRS way!! :) We will see that as well in the next article..!! :D

Monday, 23 May 2011

Randomize the entries of sorted array

“We have seen about randomization and probability. But the application of these math concepts are mainly with randomized algorithms!!.. We have also seen asymptotic notations for normal algorithms.. but for randomized, estimating what will happen is probabilistic!! “

See Also

http://analgorithmaday.blogspot.com/2011/05/random-number-generation-logic-linear.html

http://analgorithmaday.blogspot.com/2011/05/probability-for-dummies.html

Introduction

   Algorithms which are non-deterministic but probabilistic are called randomized algorithms. If its a polynomial algorithms, then it starts from O(1) and goes up to O(n^m). If the algorithm uses divide and conquer and divides the problem into 2^step times, we get a logarithmic timed algorithms. If the algorithm uses random number generators and solves a problem in pure random, these are randomized algorithms for which we need to use expectations to solve.

  Expectations mean: expecting that some event happens always. We have seen an example like how much possible that always we get heads in every toss!! To find this, we use expectations and expectations are easily represented using indicator random variable, I

   I = {0 or 1} 1 = means event occurs, 0= means event does not occur.

This indicator variable is the basis to calculate the expectation of an event. For example,

   XH: is the event that head always occurs.

then, XH = I{ always getting head }

So, the expectation of XH = E( I{always getting head})

                                         = 1 * Probability of getting heads + 0 * Probability of getting tails

                                         = 1. (1/2) + 0.(1/2) = 1/2

Which means, by tossing the coin once, the expectation of getting head is 50%. This means even if you do 50 flips, the probability totals to 50% chance.

Concept

   The above concept about expectations are the basis to all randomized algorithms. We need to expect some value for which the algorithm will perform at its maximum to get the best case performance. We need to expect for the failure case to get the worst case performance.

  In this article, we will consider about randomizing the entries in an sorted array. As we already learnt about the concept behind randomness. There are two types: pseudo random number, perfectly random number. For a perfect random number we need a uniform random distribution which depends on the seed value and the constant factors(which should be prime always).

   But pseudo random number generators are faster and almost gives a pseudo random which is suited for most of the applications. But we must ignore duplicates if we want only unique random numbers!! :). As we told its random, even repetition means random.

  To get true randomness, we have to select the seed value perfectly. As we have already studied, mostly seed value used would be time.

Really good tutorial along with code is present here: http://www.learncpp.com/cpp-tutorial/59-random-number-generation/

  Let me reiterate about, the need of seed. Seed is nothing but the multiplier for generating random number. Usually, random number runs from 0 to RAND_MAX in C++. In all languages, you have a random maximum (usually the size of unsigned integer). The seed and the constant factor mod with the range you want gives the pseudo random value required.

Code

To get random number between two ranges,

int getRandom(int start, int end)
{
    srand(time(NULL));
    return (rand() % (end-start+1))+start;
}

To get unique random numbers,

#include <bitset>
int getRandom(int start, int end)
{
    srand(time(NULL));
    return (rand() % (end-start+1))+start;
}
 
void main()
{
    int numbers[6];
    int checker=0;
    bitset<6> myset;
    int i=0;
 
    do {
        int num = getRandom(0,5);
        if((checker & (1<<num)) == 0) 
        { 
            numbers[i] = num;
            i++;
        }
        checker |= 1 << num;
        myset = checker;
    }while(myset.count() != 6);
}
  • I have played a bit with bitwise. :) But all these are concepts from http://analgorithmaday.blogspot.com/2011/04/unique-items-in-array.html
  • Note that, the size of the random number domain, needs to be perfect. Only then checker will get all 1’s in its array. Be careful with bits!! :)
  • For a normal case of finding duplicates or for random numbers getting unique, both has the most important problem. Finding when the loop ends with all the unique items extracted!! you can use either i’s count or bitset’s count function.
  • The algorithm is written for reducing the use of extra space!!.. But bitset uses extra space anyway. :D more simpler would be just track the value of i. It doesn’t need any extra space ;)
  • For our case, i have made use of bitset. Its available in Java. In C, GCC provides an intrinsic __builtin_popcount
  • Refer here for more details on how bitset works: http://en.wikipedia.org/wiki/Hamming_weight

Final code to randomize sorted array

int getRandom(int start, int end)
{
    srand(time(NULL));
    return (rand() % (end-start+1))+start;
}
 
 
int* getUniqueNums(int limit)
{
    int *numbers = (int*) malloc(sizeof(int)*limit);
    int checker=0;
    bool *myset = (bool*) calloc(limit, sizeof(bool));
    int i=0;
 
    do {
        int num = getRandom(0,limit-1);
        if(!myset[num]) 
        { 
            numbers[i] = num;
            myset[num] = true;
            i++;
        }
    }while(i<limit);
 
    return numbers;
}
 
void main()
{
  int a[] = {1,2,3,4,5,6};
  int *b = getUniqueNums(6);
 
  // use b array as priority array into a
  for(int i=0;i<6;i++) {
      b[i] = a[b[i]];
  }
}
  • In the above example, we have avoided usage of bitset. Studying C++ bitset is useful btw. :) UPDATE: we don’t need that extra space, we can just count on the i. But in above code, we are not using checkers, so we need myset bool array!!
  • Analysing the performance of this code will use Expectations/Probability as you can note that, the time taken to run getUniqueNums is completely random :)
  • As we have based the seed over system time, the expectation is also on system time. But we have to expect completion of the algorithm getUniqueNums in O(limit) in best case.
  • We have completely ignored other popular randomized algorithms like Randomized quick sort and randomized selection algorithm. Which we will see in coming days.

Thursday, 19 May 2011

Probability for Dummies

"The various possible ways are decided using permutation or combination. But what is the possibility that a arrangement can occur is called probability.. Randomness in this world requires probability to predict..!! All game plays are made with some defined principles and randomness..But how to measure at least well defined randomness so that to predict something? plainly with probability"
See Also
http://analgorithmaday.blogspot.com/2011/05/set-theorya-dummies-guide.html
http://analgorithmaday.blogspot.com/2011/05/permutations-combinations-for-dummies.html

Introduction

  In mathematics, you have deterministic or non-deterministic problems. All non-deterministic problems are called probabilistic problems. Simply, All math done with formula's are deterministic..!! Even for generating randomness, we use formula's. But note that these are non-deterministic formulas!!.. You can never say when they will fail!!.. a formula should never fail right ? :)

   The beauty of probability is to check how much we can pass/fail rather than simply pass!!.. It is sometimes just a condition over permutations/combination. That is, reason we studied about permutation & set theory well before.

 Pass/Fail means Probability.. Which means.. Pass = 1, Fail = 0. So the value of probability will lie between 0 and 1.

If probability of some event is 0, then it means it will never occur (utter fail or flop). If its 1, then it will definitely occur(success).

What is the event a probability measure? By now, you would have guessed. Random and known time interval occurring events. But we need to put some question for which we need to calculate probability (it can be either % failure or % success calculation).

The common example, flipping coin. The probability of getting tail 50%. The probability of getting head is 50%. Because the event is equally likely. You will either get a tail or head with equal probability.

But not all events are equally likely like this. Lets take an example question like this.
A couple has 2 kids and atleast one is a boy. What is the probability that both are boys?
Get the permutations of 2 kids: GG, BG, GB, BB.. 4 arrangements. Since we only care about probability of getting 2 boys, Ignore GG. Total 3 arrangements in which we find a boy out of which only 1 has both of them as boys.
So the probability is: 1/3

Concepts (Math)
Some more math about Probability Point-by-Point.
  • Probability values are always between 0 and 1. So, it also like 0% possible, 100% possible
  • It operates over a sample space called Experiment(E). It's nothing but the set of all the permutations. The set of permutation changes based on the kind of probability question we ask!!
  • We call some thing as event if it occurs once in the experiment. Event is nothing but subset in the Experiment set. We calculate the probability of an Event in the experiment!!
  • All set theory concepts applies to our experiment set(sample space). You can combine two experiments (Union). You can take common events across Experiments(Intersection). You can even take difference between two experiments.
  • Disjoint sets as we already studied means empty set when you take intersection between them. Same case applies here, disjoint events means no relation between the two events.
  • If you have two events and you know that always 1 event occurs, then we can use conditional probability. Conditional probability is nothing but taking intersection between the two events. (take randomness of only common events)
Some more concepts
  Sampling is a statistical concepts. Probability too works with samples. There are many more mathematics part of statistics like  differential equations and integrals. Distributions are all defined with differentials and integrals. Distributions are basically used to generalize the function. The same distributions can be used to generalize a probability function as a random variable. This is called Probability distributions.

  Whatever probability you have, the probability distribution runs from 0 to 1 and also there will be a mean value between these two numbers.
Example: If there are large number of trails of tossing the coin, then heads will be the max with probability around 50%.

Where it is used ?
Probability and expected value function is a must to know to analyze the random algorithms. We studied all this just because we are going to study randomized algorithms :).

It is known that the randomized algorithms perform really well in average case for specific problems.

More about Probability
  I would suggest referring a better math book rather than me writing about it. There are many more formulas and most used probability functions. Please refer a proper book.

Friday, 13 May 2011

Permutations & Combinations for Dummies

“I never understood concepts like permutation, combination which is quite confusing. But permutation is nothing but just a way to count all possibilities of a nested iteration!!

Introduction

  Just take, the following nested for loops

for(int i=0;i<m; i++) {
 for(int j=0; j<n;j++) {
   cout<<i<<":"<<j<<endl;
 }
}

Simply we can tell directly the total iterations are m*n. But the traversal done by the loop with an array will be like..

REPEAT –> read the row –> read from left to right –> move to next row –> UNTIL i*j == m*n

These are the various values comes out considering m=3, n=3,

{0,0}, {0,1}, {0,2}

{1,0}, {1,1}, {1,2}

{2,0}, {2,1}, {2,2}

Keeping i, we iterate with changing symbols in j.. Doesn’t it look like Cartesian Product with a Condition?

This kind of rearrangement of numbers are necessary to traverse the array. Similarly you need rearrangement of numbers in various applications!!..

For example, say, you need to find all possible rearrangements of a shuffled 56 deck card.. There are numerous ways if you need to find the rearrangements of all the 56 cards(56!).. But if you want to know rearrangements possible when the cards are distributed within 14 people..

You just need to count 14 possible rearrangements out of 56 cards to calculate. This is where you need permutation..

To understand permutation well, you need to understand set theory!! Especially Cartesian Product!.. Learn about set theory first: http://analgorithmaday.blogspot.com/2011/05/set-theorya-dummies-guide.html

Concept

   Every multiplication can be represented with a set of additions. When you add a number that much times, you get the multiplied value. Similarly, if you have a set of 3 numbers.. how would need to find the number of ways it can be rearranged?

Strings

   If you take Cartesian product of same set k times with each set having some n symbols in it, you get strings. S^k sets with each having n items.

i.e, If the set is S, S x S x S is a 3-set product. If you have only 2 elements in it, then total you get 2^3 distinct elements as a result of this product. This count of distinct elements can be got from n^k equation. Example, Take a binary string of length 3. You have max 2 symbols possible in a binary. Total possible sets are,

000,001,010,011,100,101,110,111

Which is 2^3.

Permutation

Permuting means, simply re-arranging without missing any symbols!!.. if you want the number of rearrangements what would you do ? That’s where permutation comes into picture.

For example, saw you have a set {a,b,c}. The total number of ways, you can order this variables are,

abc, acb, bca, bac, cab,cba

Total 6 ways, the above 3 char array can be permuted. This gives as a pattern, the first element is chosen n ways, the next ignoring the first element chosen in n-1 ways, the next in n-2 ways. This means, you get n(n-1)(n-2) which is nothing but n!

If in the same above array, you just need to know the permutation of only k-elements instead of all n elements. (n is the total elements in the array).  ex: this is similar to finding the 14 cards possible rearrangements in 56 card deck..

then, the total ways are,

ab, ac, bc, ba, ca, cb

Again the same, 6 ways. But now, n, n-1, n-2 and upto n-k+1 is what required. So, The formula becomes like below,

Combination

We already saw about Strings which we get by Cartesian product of the same set and get the unique items out of the set with each symbol in an order.

Combination has the same property of getting unique items with each item at one place but ignores the order of items requirement. It considers symbols in both the set and takes out in-order uniquely!!.. i.e, You should remove transitive, symmetric symbols in a combination!!

Ex: Permutation for 2 element in a 4 element set, {a,b,c,d}

ab,ac,ad,ba,bc,bd, ca,cb,cd,da,db,dc

Total 12 sets are possible since we need to consider every re-arrangement in any order.

Combination for 2 element in a 4 element set, {a,b,c,d}

ab, ac, ad, bc,bd, cd

There are only 6 sets!!.

Formula:

Examples

Some real world example!!

http://betterexplained.com/articles/easy-permutations-and-combinations/

You have 8 players.. and 3 different cups to be won. Gold, Silver, Bronze. You give each cup to only one of the players and the cups will be given in the order, Gold->Silver->Bronze.

Gold: When you need to give Gold, you have all 8 players competing for it. So, total 8 options possible.

Silver: When you need to give Silver, you have already given Gold, so only 7 options possible

Bronze: When you need to give Bronze, you have already given Silver, so only 6 options possible.

Total ways in which the 8 players can win will be 8! ways..

But, there are only 3 cups, so Total ways in which the top 3 cups can be won, if given one-by-one in order to the persons will be: 8 * 7 * 6.

This is where Permutation comes into Picture!

If the price money is same, i.e, a Stainless steel plate. You don’t need to care about the order in which the 3 cups are given. So, its 3 cups given to 8 players. But 3 cups re-arrangement don’t need to be taken into account as all are equal. So 3! ways of arranging the cups(6 ways) are useless when it comes to just picking some 3 out-of-8.

This is combination and you get only: 8*7 ways according to the formula.