Showing posts with label method. Show all posts
Showing posts with label method. Show all posts

Thursday, 12 May 2011

Set Theory–A Dummies Guide

“If its possible to express the world in a set, a mathematician would do that!! :) The power of sets are vast that its used in statistics and computing a lot!!. To understand algorithms clearly, you need to understand set theory!!”

Background

    I am assuming that, the reader already know a little bit about Sets. Sets is a collection of unique objects. There is natural(without negative) number set, real number(floating point) set, integer(with negative) set. You can also have sets with different symbols. But the major operations that can be done between two sets, A & B, are:

  • Intersection: common items between A & B , $A\capB$
  • Union: all unique items between A & B (common items appear only once)  $A\cupB$
  • Difference: if its, A-B, then items in B if present in A are removed and you get remaining elements.
  • Disjoint: both A & B don’t have any items in common

I assume you understand the above general and basic concepts to move on into More details

Point-By-Point

   Extra concepts which you might require in some problem solving & algorithms are,

  • singleton set: a set with only 1 element in it.
  • partition of a set: a set if gets partitioned into subsets of size>0.. its called a partition. [Check here: http://en.wikipedia.org/wiki/Partition_of_a_set]
  • complement of a set: If you take a set A, complement of set A is the elements which are not present in A. $\bar{A}$
  • universal set: $A\cup\bar{A}$ is a universal set which includes all the elements in the bigger set
  • n-set: If you have “n” elements in a set, its called n-set,
  • k-subset: If you have “k” element subset from a “n” element set, its called a k-subset of the bigger set.
  • Cartesian product set: Its the product of two sets.. straight to straight!! A x B = {a,b} x {a,b} = {(a,a),(a,b),(b,a),(b,b)}
  • binary relation set: subset of Cartesian product between sets. For example, if A & B are subsets in a Natural number set and a relation is defined between the elements in the sets, like, a<b, for a in A, b in B, then, it means, the two sets A & B has elements sorted in ascending order when combined!!
  • equivalence relation between sets: is a type of binary relation set!! but the sets will contain elements which are disjoint and exhibit equivalence among each other!!.. i.e, two sets are equivalence sets only if they are part of a same set under a binary relation
  • equivalence class: the condition of equivalence between two sets should hold in all the elements of the class set!!.. i.e, if you need a set of even numbers from a natural number set given an even number, define a relation such that, equivalence is hold, like a+b is even..
  • functions: extension of Cartesian product. Instead of just Cartesian product, we place a function with a binary relation among elements in the set

Practical Applications

  • The main application will be on Graphs!! :) We will study about it soon. We have all the set concepts in a graph!!
  • As it can put in Graph, Tree is just a subset of Graph!!.. Tree concepts are all derived from the set theory!!
  • Binary trees we studied are disjoint sets composed of: root node, left sub tree, right sub tree. Disjoint means, all these 3 sets should have completely different nodes!! It’s not necessary to have any nodes as well. This makes, Binary tree a special case of Trees.
  • Partition of a set can be directly related to substring, subsets in an array etc.

Example: The set { 1, 2, 3 } has these five partitions:

    • { {1}, {2}, {3} }, or 1/2/3.
    • { {1, 2}, {3} }, or 12/3.
    • { {1, 3}, {2} }, or 13/2.
    • { {1}, {2, 3} }, or 1/23.
    • { {1, 2, 3} }, or 123

This is clearly how python lists work!! :). Python and Perl both have splice, union, intersection of arrays/lists.

  • Red-black tree’s & Graph colouring all uses the set theory principles to distinguish different coloured nodes
  • Equivalence relations are used in extracting subsets which exhibit equivalence. The famous subset sum problem(http://www.codechef.com/problems/PAIRSUM/).
  • Functions over a set can be used to extract elements matching a binary condition!! we use nested for loops & if conditions to do this actually :)

Wednesday, 11 May 2011

Random number generation logic - Linear congruential generator

Randomness is really important for probabilistic actions. Like card shuffling, coin tossing etc. How can we represent that in a machine? There are lot of random number generators available!!”

Introduction

  Random number generators have huge application space mainly in Cryptography. But we need this random number generators for the articles we will soon start studying about, HASH TABLES!! :) You need random unique keys for storing the values so that search is O(1) similar to an array.

  There are types of randomness however!! Pseudo Randomness & “True” Randomness!!.. All proven with heavy mathematical theorems. “True” Randomness is really important for security algorithms like RSA. But if the randomness fails and a pattern appears, you get to crack the security system as well!! :)

   Randomness is mystic and same as how our universe behaves :) So impossible to get true randomness without leaving it to run around the world.

  Watch this movie to understand how tough it is.. :) It’s about a Math scientist trying to find patterns and suicides with a head gun shot http://www.imdb.com/title/tt0138704/ (Psycho movie.. Beware)

  Ok, into the business. let’s get into some simple random generators!!

Concept

       LCG – Linear Congruential Generator is a method to generate pseudo random numbers until a particular range. After that range, the numbers would start repeating. This range value is also called as “seed”.

 

 m,\, 0<m — the "modulus"
 a,\,0 < a < m — the "multiplier"
 c,\,0 \le c < m — the "increment"
 X_0,\,0 \le X_0 < m — the "seed" or "start value"

Source: http://en.wikipedia.org/wiki/Linear_congruential_generator

The random value can go up to  MAX = “m”. single linear congruent equation usually repeats numbers. To avoid that, we have a multiplier “a”. We can either keep “c”, zero or non-zero.

If you have only “a”, then its called multiplicative congruential method.

Note that, the values of m, a, c decides the amount of randomness you get with this function. The values of these numbers are experimented and must have noted properties. :D

Refer the Wikipedia link for more details.

In C/C++, mostly we use time() function as the seed which gives us good precision. like below,

   srand(time(NULL));

After that, we set the modulus value “m”, by giving the rand a value,

   rand() % 500;

The above will generate a random number below 500 with X0 = system time + the above suitable random multipliers. But you should note that, if you iterate 1000 times the above rand function, you will start getting the same numbers frequently.. we should get at least the first 500 times the different number!!.. But even that won’t happen :D

How this recursive function generated different numbers ?

The above plot will show you how the values get randomly distributed!!. Note that, this is happening just because m and a are prime..!!

Check more plots here: http://demonstrations.wolfram.com/LinearCongruentialGenerators/

Code

Try the below code:

void main()
{
    srand(time(NULL));
    for(int i=0;i<1000;i++)
        cout<<": "<<rand()%100<<endl;
}

I got the following last set of output with “8” straight-to-straight repeated. Just with a difference of some 2 extra iterations!!..

Output
 
: 8
: 18
: 37
: 8
: 33
: 14

Tuesday, 3 May 2011

Relation between Binary Search Tree and Quicksort–MIT lecture series (hand picked)

 

Related links

http://analgorithmaday.blogspot.com/2011/02/partitioning-algorithmin-place.html

http://analgorithmaday.blogspot.com/2011/02/quick-sort.html

http://analgorithmaday.blogspot.com/2011/03/binary-search-treeintroduction.html

Introduction

   We have studied lots of logarithmic performing algorithm which internally use binary trees which is the reason for logarithmic time. But we have studied into it more, about how the performance is log n for a tree ? How does recurrence relation work? What math does Quick sort involve ?

Lets start studying some math behind algorithms with some interesting hand picked MIT lectures.

Video

Binary Search Tree and Quicksort

Explanation

   How would you create a BST using an array ?

A[n] = 3 1 8 2 6 7 5, where A is an array and n is the size of it

BST insert comes out like this,

                        3

                 1               8

                     2       6     

                         5        7

When you do an in-order traversal of the above BST, you get the sorted array..

In-order: 1 2 3 5 6 7 8

So, this means, we have to do n BST inserts and then do an in-order traversal to get a sorted array. This itself is a new sorting method named BST sort. :)

BST-Sort
   Tree T
   array A
   for i = 0 to n
      do Tree-Insert(T, A[n])
   
   Inorder-Traverse-Tree(T)

What is the performance of the above sort?

There are ‘n’ Tree inserts, so its O(nh) where h is the height of the tree.

and then an in-order traversal is done which is O(n) . So it will be O(2nh).

But we know that height of the tree can be computed from number of nodes in it: h = log n.

Example,

Here, the number of nodes = 7, so the height would be, =  log 7

                                                                                    =   log 2^3 (approx.)

                                                                             So, h = 3

But, take an example of an sorted(or reverse sorted) array.

A[n] = {1,2,3,4,5,6}

If you create a BST with these values, you get a tree like this,

1

    2

       3

          4

              5

                    6

What is the performance of the Tree-Insert in the above for loop? In such a case, Height of the tree is equal to the number of nodes!!

height of tree, h = log n, holds approximately only for trees which are partially balanced. So, if height equals number of nodes, you get O(n2) even for this BST sort.

Can’t you recollect that, This is matching with performance of the Quicksort algorithm ?

Yes, it is matching!! :) Quick sort gives O(n2) performance if the array is already sorted. This means, Quick sort as well creates a BST internally. Just way comparisons are made is different. This video explains that beautifully.

So to avoid that, randomized quick sort came into picture.

Randomized quick sort is the basis for all kinds of self-balancing trees!!!

If you understand how randomized quick sort work, we can understand AVL, red-black trees etc..

From the above example, its also clear that why self balancing tree is a must for getting a log n time complexity!!

Monday, 7 March 2011

Sorting in linear time

“Not so important but some ask!!.. You have a integer array for which you know that all the elements are less than some value… How efficiently you can sort such an array? You would tell merge sort, heap sort, quick sort.. But all will yield only O(n log n) max. That’s where different methodologies for sorting comes up”

Introduction

   These are quite old ways of sorting but useful when you want to do sorting in linear time without affecting the stability. There is new concept comes up here. Stability. Stability is nothing but maintaining the order of the repeated elements even in the sorted array. That is, is {2,2} which comes part of unsorted array in order when sorted, you should get the first ‘2’ first and the second following it (in the same order as unsorted array).

So, How stable is our sort we studied till now?

Say, you have {4,2,0,1,1,9}

In insertion sort, you will {1,1} in the sorted sequence in-order. So, its stable.

In merge sort, you create two lists and merge, i.e, if the merge happens between {2,0,1} and {1,9}, you still get {0,1,1,2,9} with {1,1} in same order. This is true in case of using extra space. But if there is no extra space, you would disobey the order since you need to swap or link.

In quick sort or heap sort, you do all swaps in-place. You even swap elements like {1,1} which changes its order!!.. So, they are not stable!!..

Most of the comparison sorts in which swapping is involved without considering the order, is a non-stable sort.

Till now, what we saw are all comparison sorts

Concept

   There is a new learning we got from comparison sorts and its performance. We will revise the performance of all the sorts we learnt till now.

Merge sort

Worst case – O(n log n)

Average case – Theta(n log n)

Best case – Gamma(n log n)

Insertion sort

Worst case – O(n^2), Average and Best might reduce the swaps required, but if its in-place, you always get O(n^2) with lesser constant time

Heap sort

Same as merge sort. max heapify takes O(n) and recursion pulls in O(log n) to get O(n log n)

Quick sort

Worst case – O(n^2) if the array is reverse sorted.

Best and Average – O(n log n) with very less constant time.

All the above sorts work with a basic property for sorting, comparisons. Comparison and swap are the costliest operations on all the above sorts which never makes to yield lesser than O(n log n) time.

In case of heap sort, max heapify requires only O(n). But O(log n) extra is required to up-bubble or down-bubble elements since we are based upon comparisons.

If we ask, can we ever sort in time less than O(n log n)? Then its a big yes!! but not using comparison as its base.

Conclusion

   In coming articles, we will start reading a completely different perspective of sorts which is used only for special cases and are not general purpose sorts. But sorting can be done in linear time with extra space when we are not using comparison sorts. Since most of these sorting is possible only on integers, they even call these sorting category as integer sorting methods.

Wednesday, 23 February 2011

Quick sort

“Beauty of quick sort is, its so quick!!.. unlike other sorting algorithms based on divide and conquer, where divide, conquer, combine all are independent steps. In quick sort, divide and combine are done in one-shot. i.e, just divide actually sorts the whole array!!”

See Also

http://analgorithmaday.blogspot.com/2011/02/partitioning-algorithmin-place.html

Introduction

   Quick sort is the mostly discussed and used algorithm in any code that you have written. That’s why its the simplest interview question to be asked!! :). If even this cannot be answered, then you will be thrown out. The first important step is partitioning. Please study the above link fully!!

  Multiple partitioning is what quick sort does. But while doing multiple partitioning, it reduces the input set given for partitioning. So, partially sorted array at both the side of a partition, again gets partitioned further.

Concept

   The important concept to be understood in a quick sort is that, we take each and every element in the array as pivot to get the final sorted array. It’s not possible to take one element as pivot and sort the whole array!!.. As a dummy, i too thought like that, but its not the case.

Take for example,

   13   4  5  6  11  10  2

Partition=1: pivot=2,

   13 4 5 6 11 10 (2)

Partition=2: pivot = 10

    4 5 6 (10) 11 13

Partition=3: left, pivot=6, right pivot = 13

4 5 (6)       11 (13)

Partition=4: left pivot=5, right pivot=11

4 (5)   (11)

Ignore the pivots in each partitions.. to decide the next pivot. This is also the main concept of quick sort!!

The tree representation,

    13   4    5    6    11    10   (2)

                                   /              \

                     4   5  ( 6)         (10)      11    (13)

……….. Check some reference books to get a perfect quick sort tree..!! :)             

So, final sorted list is, 2 4 5 6 10 11 13.

But note that, we don’t partition single element arrays.. :) That is base condition for quick sort. And also you could see that, the pivot rearranges the array, so if you consider a tree, it will be like, go from top to bottom to partition, to get all sorted, go from bottom to top!!..But pivots should occur in order, they are in the initial tree shown above!! But note that, all elements become a pivot!!

Code

void swap(int& a, int& b)
{
    register int tmp;
    tmp = a;
    a=b;
    b=tmp;
}
 
int partition(int* arr, int start, int end)
{
    /* 
      1. decide a pivot element, let it be arr[end]
      2. categorize arr the remainting start -> end-1 
      3. consider 2 indeces = i,j: initially i=start-1, j=start
      4. make sure that i+1 to j-1 always contains element > pivot
      5. the above means, i marks end of smaller elements array
      6. j marks the end of larger elements array
     */
 
    /*
    Example:
      start=0                                           end=12
  i=-1   j=0                                            
        3    6    2    1    9    4    12    19    13    10    8    5    11
        
        After you do this partition,
        start                            i            j       end
        3    6    2    1    9    4    10    8    5    [11]    12    19    13    
    The important point here is, after 4, we see 8 which is < 11.
    So, it needs to be inserted.
    Because of the above point we need 2 indeces.
    */
 
    int i,j, pivot;
    pivot = arr[end];
    i=start-1;
    for(j=start; j<=end-1; j++)
    {
        if(arr[j] <= pivot) {
            i=i+1;
            swap(arr[i], arr[j]);
        }
    }
    swap(arr[i+1], arr[end]);
    return i+1;
}
 
void quick_sort(int* arr, int start, int end)
{
    if(start < end) {
        int mid = partition(arr, start, end);
        quick_sort(arr, start, mid-1);
        quick_sort(arr, mid+1, end);
    }
}
 
void main()
{
    int A[]={3,    6,    2,    1,    9,    4,    12,    19,    13,    10,    8,    5,    11};
    int B[]={6,10,2};
 
    //sorting 3 elements
    quick_sort(B, 0, 2);
    //sorting more elements
    quick_sort(A, 0, 12);
}

Important notes

  • quick sort has the perfect predictable average case of theta(n log n).
  • quick sort also has Theta(n2) worst case. Merge or heap sort is the best algorithm since even worst case is O(n log n). But, merge or heap sort has space complexity, bad average case and large constant time!!
  • quick sort performs in its worst case when the array is sorted!! its because, all the splits will be bad!!.. always you will have the pivot at the end!!..
  • If the pivot is at the end, even after the split, its a bad split!!.. good split means, you get the pivot at the middle after a partition
  • To do quick sort in descending order, just change the <= to >= in partition code!!

Monday, 21 February 2011

Tree traversal for binary trees

“We have seen binary trees and specific use cases of binary trees!!.. But trees doesn’t need to be always binary!!.. one root can have multiple children.. there can be cycles.. When these cycles are created, they are known as graphs!!.. The structures are all application based!!.. But how would you traverse it is not!!.. There are specific ways to visit the nodes in a tree.”

Introduction

     As we already studied about heap in the previous article, you should have noted two ways of moving inside the tree: up-heap and down-heap. In up-heap, we move from down to top and down-heap, top to bottom. This is a form of traversal which is application specific, specific to heap. Moreover, insertion happens in left-to-right, that is, fill all the above nodes before filling the next level which is kind of Level order or breadth first. Similarly there are many ways a tree can be traversed.

Concept

  Tree must be understood as a type of graph from now on. In a graph, the main applications are derived around the traversal which we will see soon. Similar to that, even in a tree, there are many applications linked with a tree traversal. The types of traversal’s possible in a tree are,

  • Normal graph traversals – Breadth First (also known as level-order in a binary tree), Depth First
  • Normal tree traversals – In-order, Pre-order, Post-order

  You can do all the above traversals in a tree depending upon the application.

Diagrams for each of the Tree traversals, along with level-order,

Courtesy: http://www.macs.hw.ac.uk/flex/BScCS/ds1/activity10.html

Code

#define PARENT(i) i-1/2
#define LEFT(i) 2*i+1
#define RIGHT(i) 2*i+2
#define ARRAYSIZE(a) sizeof(a)/sizeof(a[0])
 
int A[14] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14};
 
/*
The tree looks like this,
                    1
            2                3
       4        5        6        7
    8    9      10  11  12  13  14
preorder: root=>left=>right
1. is(root) => left => is(root), until root = NULL
2. visited(root) => right => is(root) => left => is(root), until root=NULL 
    1 2 4 8 9 5 10 11 3 6 12 13 7 14
postorder: left=>right=>root
8 9 4 10 11 5 2 12 13 6 14 7 3 1
*/
 
void preorder_stack(int idx)
{
    Stack<int> obj;
    int j= idx;
    while(j < ARRAYSIZE(A)) {
        int i = j;
        // move to root till left
        // note, this runs for all the root
        // even the right root!!
        while(i < ARRAYSIZE(A)) {
            std::cout<<A[i]<<" ";
            obj.push(i);
            i=LEFT(i);
        }
        // move to the right
        int val;
        obj.pop(val);
        j=RIGHT(val);
        // remove leaf less nodes
        while(j > ARRAYSIZE(A)) {
            obj.pop(val);
            j=RIGHT(val);
        }
    }
}
 
void preorder(int idx)
{
     if ( idx < ARRAYSIZE(A)){
         std::cout<<A[idx]<<" ";
         preorder(LEFT(idx));
         preorder(RIGHT(idx));
     }
}
 
void postorder(int idx)
{
    if(idx < ARRAYSIZE(A)) {
        postorder(LEFT(idx));
        postorder(RIGHT(idx));
        std::cout<<A[idx]<<" ";
    }
}
 
void inorder(int idx)
{
    if(idx < ARRAYSIZE(A)) {
        inorder(LEFT(idx));
        std::cout<<A[idx]<<" ";
        inorder(RIGHT(idx));
    }
}
void main()
{
  preorder_stack(0);
  std::cout<<endl;
  preorder(0);
  std::cout<<endl;
  postorder(0);
  std::cout<<endl;
}

Important points

  • This algorithm uses array which is not usual. Mostly you would see linked list based implementation
  • We will see more of linked list in coming chapters before jumping into trees using linked list
  • Preorder – we need to visit all the roots at the left and then right, but whenever we see root, we need to make sure that root is visited first!! then left and then right
  • Post order – visit all left and right nodes, before visiting the root
  • Inorder – visit left node and then the root and the right node.
  • Only for preorder, i have written an implementation using Stack, iterative way. It can be reused for all others and very useful when you need to do traversal using iterations.