Monday, 23 May 2011

Randomize the entries of sorted array

“We have seen about randomization and probability. But the application of these math concepts are mainly with randomized algorithms!!.. We have also seen asymptotic notations for normal algorithms.. but for randomized, estimating what will happen is probabilistic!! “

See Also

http://analgorithmaday.blogspot.com/2011/05/random-number-generation-logic-linear.html

http://analgorithmaday.blogspot.com/2011/05/probability-for-dummies.html

Introduction

   Algorithms which are non-deterministic but probabilistic are called randomized algorithms. If its a polynomial algorithms, then it starts from O(1) and goes up to O(n^m). If the algorithm uses divide and conquer and divides the problem into 2^step times, we get a logarithmic timed algorithms. If the algorithm uses random number generators and solves a problem in pure random, these are randomized algorithms for which we need to use expectations to solve.

  Expectations mean: expecting that some event happens always. We have seen an example like how much possible that always we get heads in every toss!! To find this, we use expectations and expectations are easily represented using indicator random variable, I

   I = {0 or 1} 1 = means event occurs, 0= means event does not occur.

This indicator variable is the basis to calculate the expectation of an event. For example,

   XH: is the event that head always occurs.

then, XH = I{ always getting head }

So, the expectation of XH = E( I{always getting head})

                                         = 1 * Probability of getting heads + 0 * Probability of getting tails

                                         = 1. (1/2) + 0.(1/2) = 1/2

Which means, by tossing the coin once, the expectation of getting head is 50%. This means even if you do 50 flips, the probability totals to 50% chance.

Concept

   The above concept about expectations are the basis to all randomized algorithms. We need to expect some value for which the algorithm will perform at its maximum to get the best case performance. We need to expect for the failure case to get the worst case performance.

  In this article, we will consider about randomizing the entries in an sorted array. As we already learnt about the concept behind randomness. There are two types: pseudo random number, perfectly random number. For a perfect random number we need a uniform random distribution which depends on the seed value and the constant factors(which should be prime always).

   But pseudo random number generators are faster and almost gives a pseudo random which is suited for most of the applications. But we must ignore duplicates if we want only unique random numbers!! :). As we told its random, even repetition means random.

  To get true randomness, we have to select the seed value perfectly. As we have already studied, mostly seed value used would be time.

Really good tutorial along with code is present here: http://www.learncpp.com/cpp-tutorial/59-random-number-generation/

  Let me reiterate about, the need of seed. Seed is nothing but the multiplier for generating random number. Usually, random number runs from 0 to RAND_MAX in C++. In all languages, you have a random maximum (usually the size of unsigned integer). The seed and the constant factor mod with the range you want gives the pseudo random value required.

Code

To get random number between two ranges,

int getRandom(int start, int end)
{
    srand(time(NULL));
    return (rand() % (end-start+1))+start;
}

To get unique random numbers,

#include <bitset>
int getRandom(int start, int end)
{
    srand(time(NULL));
    return (rand() % (end-start+1))+start;
}
 
void main()
{
    int numbers[6];
    int checker=0;
    bitset<6> myset;
    int i=0;
 
    do {
        int num = getRandom(0,5);
        if((checker & (1<<num)) == 0) 
        { 
            numbers[i] = num;
            i++;
        }
        checker |= 1 << num;
        myset = checker;
    }while(myset.count() != 6);
}
  • I have played a bit with bitwise. :) But all these are concepts from http://analgorithmaday.blogspot.com/2011/04/unique-items-in-array.html
  • Note that, the size of the random number domain, needs to be perfect. Only then checker will get all 1’s in its array. Be careful with bits!! :)
  • For a normal case of finding duplicates or for random numbers getting unique, both has the most important problem. Finding when the loop ends with all the unique items extracted!! you can use either i’s count or bitset’s count function.
  • The algorithm is written for reducing the use of extra space!!.. But bitset uses extra space anyway. :D more simpler would be just track the value of i. It doesn’t need any extra space ;)
  • For our case, i have made use of bitset. Its available in Java. In C, GCC provides an intrinsic __builtin_popcount
  • Refer here for more details on how bitset works: http://en.wikipedia.org/wiki/Hamming_weight

Final code to randomize sorted array

int getRandom(int start, int end)
{
    srand(time(NULL));
    return (rand() % (end-start+1))+start;
}
 
 
int* getUniqueNums(int limit)
{
    int *numbers = (int*) malloc(sizeof(int)*limit);
    int checker=0;
    bool *myset = (bool*) calloc(limit, sizeof(bool));
    int i=0;
 
    do {
        int num = getRandom(0,limit-1);
        if(!myset[num]) 
        { 
            numbers[i] = num;
            myset[num] = true;
            i++;
        }
    }while(i<limit);
 
    return numbers;
}
 
void main()
{
  int a[] = {1,2,3,4,5,6};
  int *b = getUniqueNums(6);
 
  // use b array as priority array into a
  for(int i=0;i<6;i++) {
      b[i] = a[b[i]];
  }
}
  • In the above example, we have avoided usage of bitset. Studying C++ bitset is useful btw. :) UPDATE: we don’t need that extra space, we can just count on the i. But in above code, we are not using checkers, so we need myset bool array!!
  • Analysing the performance of this code will use Expectations/Probability as you can note that, the time taken to run getUniqueNums is completely random :)
  • As we have based the seed over system time, the expectation is also on system time. But we have to expect completion of the algorithm getUniqueNums in O(limit) in best case.
  • We have completely ignored other popular randomized algorithms like Randomized quick sort and randomized selection algorithm. Which we will see in coming days.