An Algorithm a Day: sorting

Showing posts with label sorting. Show all posts

Monday, 6 June 2011

Reverse in-order traversal of BST

Introduction

How would you get sorted output from a binary tree ? In-order traversal. Recollect all the things that we studied about bst till now,

http://analgorithmaday.blogspot.com/2011/03/binary-search-treeinsertion.html

For BST insertion, we follow in-order principles to construct the nodes. We traverse left/right based on the value and reach the point where the value is greater or lesser. If its less than the root, its left else its right. So, we construct the tree in such a way that, left-root-right is done in reverse from top to bottom.

But in in-order traversal, we come bottom to top from left->right. This order in which we do the recursion is very important.

Now, we know that, if we do a in-order traversal of a BST, we get increasing order sorted array. How would you get a decreasing order? How quick sort can do decreasing order sort then ?

Concept

Decreasing order sort in a quick sort is done by doing partition in a different way. When you partition the array with smaller elements in the left and larger elements at the right, you get increasing order sort. Similarly, if you keep the larger elements to the left and smaller elements to the right, you get decreasing order sort. In Quick sort, its just like changing the > symbol to <.

But in BST, we have < and > based tree construction done already on memory. In Quick sort, its done on the run. So, how does BST-Sort do descending order sort then ? With in-order traversal.

Don’t mug up that, In-order means left-root-right. :)

You can also do right-root-left. :D It’s just the way of traversal, that’s it!! :)

This is called reverse in-order by some people. So, in a BST, if you do reverse in-order, you get descending order sorted array

Code

#include <stdlib.h>#include <limits.h>#include <math.h>#include <iostream> class BST {public:    int data;    BST* parent;    BST* left;    BST* right;};  BST* createBSTNode(int data){    BST* newNode = new BST();    newNode->data = data;    newNode->left = NULL;    newNode->right = NULL;    newNode->parent = NULL;    return newNode;} void bst_insert(BST*& tree, BST* node){    BST* prev = NULL;    BST* iter=tree;    while(iter != NULL)    {        prev = iter;        if(node->data < iter->data)            iter = iter->left;        else            iter = iter->right;    }    // found node is parent to our node    node->parent = prev;     // if prev is NULL    // then this is the first node    // change the root NODE    if(prev == NULL)        tree = node;    else {        // now we need to decide where the node        // should go: left or right        if(node->data < prev->data)            prev->left = node;        else            prev->right = node;    } } BST* convert_arr_bst(int* arr, int size){    BST* tree=NULL;    for(int i=0;i<size;i++) {        bst_insert(tree, createBSTNode(arr[i]));    }    return tree;} void ascending(BST* root){    if(root == NULL) return;    ascending(root->left);    std::cout<<root->data<<" ";    ascending(root->right);} void descending(BST* root){    if(root == NULL) return;    descending(root->right);    std::cout<<root->data<<" ";    descending(root->left);}  void main(){    int A[] = {4,5,3,2,8,10,1,2};    BST* root = convert_arr_bst(A, sizeof(A)/sizeof(int));    ascending(root);    std::cout<<std::endl;    descending(root);    std::cout<<std::endl;}

There are also some BST’s which instead of storing lesser elements at the left and greater at the right, it will do in reverse. Just change the < to > in bst_insert function. You can just do in-order traversal to get the descending order numbers.
Now, we understand that, we can mirror a tree using reverse in-order. :) This is also one of the important interview question.
Another important thing, you get in a binary search tree root. The median… The root most of the times divides the array into two. If its not, it becomes completely unbalanced.

Tuesday, 31 May 2011

Randomized Selection algorithm

“Find the k-th smallest element in an array without sorting.. This is the algorithm which does that.”

Introduction

We have already seen lots of information about order statistics. Finding just the k-th smallest element in an array doesn’t require sorting but you just need to find out where the k-th smallest element will fit in.

Usual way would be, to find the max value using a variable. But what would you do, if you want to find the second max? You would maintain 2 variables. Similarly, its possible to maintain a stack of current minimums for each of the 2 elements in the array in O(n) complexity with space O(n). :) We will see about this problem in next article and also its an interview question in some companies. :D

But the right way to find the k-th minimum element is randomized selection algorithm. We have already learned about the randomized partition usage. We can use the same in randomized selection itself.

Concept

Do randomized partition. Get the location of the pivot which is also random. After you get the location, just subtract the location from the start of the array. This is nothing but the k-th position in the sorted array.

Example,

arr = 9,5,7,1,10,2,3

if the randompivot selected is 5, and the partition has happened then, the resulting array will be,

arr = 1 2 3 5 9 7 10

The current location returned will be 3 (array starting from zero).

We assumed above array is starting from zero. :) But the actual formula to calculate the k position will be,

position – start +1, we need this +1 to get the exact location.

If the above value is equal to the given k, we have got the best case here. Just O(n) to find our required smallest element.

For example, say the user wants the 3rd smallest element. We already got that. :) we don’t need to continue anymore.

But important thing here is how our random generator works and what we expect from it. If the pivot selected always helps us out then we get the best case.

We continue this randomized pivot selection based on the first value and continue recursively on either side.

1 2 3 5 9 7 10

start mid end

now, k = mid –start +1 = 4

Say, we need 6th smallest(7). we have got only the 4th smallest. since 4 is lesser than 6, we will have our 6th smallest in the left side. i.e,

within, 9 7 10

This is, mid+1 to end

But the k-th element we need to find inside this array is just the 2nd smallest!! :D

So, we need to do 6-4, that is, k – (given value) to be passed to this recursion

similarly, we need to cover for the greater case as well.

Code

int random_partition(int* arr, int start, int end){    srand(time(NULL));    int pivotIdx = start + rand() % (end-start+1);    int pivot = arr[pivotIdx];    swap(arr[pivotIdx], arr[end]); // move pivot element to the end    pivotIdx = end;    int i = start -1;     for(int j=start; j<=end-1; j++)    {        if(arr[j] <= pivot)        {            i = i+1;            swap(arr[i], arr[j]);        }    }     swap(arr[i+1], arr[pivotIdx]);    return i+1;} int random_selection(int* arr, int start, int end, int k){    if(start == end)        return arr[start];     if(k ==0) return -1;     if(start < end)    {     int mid = random_partition(arr, start, end);    i = mid - start + 1;    if(i == k)        return arr[mid];    else if(k < i)        return random_selection(arr, start, mid-1, k);    else         return random_selection(arr, mid+1, end, k-i);    } }void main(){    int A[] = {9,5,7,1,10,2,3};    int loc = random_selection(A, 0,6,5);}

Note that the k argument we pass is not starting from 0.. it starts from 1
Even though our random partition never returns greater than 6. We calculate the distance from the start and add +1 to it. This gives as the right value.
while moving in lesser than and greater than segments we must take care in passing the correct k value

Friday, 27 May 2011

Randomized Quick sort

“Randomized quick sort uses random partition method we discussed. We just select a random pivot in an array. The advantage we get is that we can reduce the worst case performance of quick sort!! But still, this optimization is expected!!”

Introduction

As we have seen a lot about this already, we can directly jump into Randomized quick sort

Concept

Just select a random pivot element in the partition algorithm.

Code

 int random_partition(int* arr, int start, int end){    srand(time(NULL));    int pivotIdx = start + rand() % (end-start+1);    int pivot = arr[pivotIdx];    swap(arr[pivotIdx], arr[end]); // move pivot element to the end    pivotIdx = end;    int i = start -1;     for(int j=start; j<=end-1; j++)    {        if(arr[j] <= pivot)        {            i = i+1;            swap(arr[i], arr[j]);        }    }     swap(arr[i+1], arr[pivotIdx]);    return i+1;} void random_quick_sort(int* arr, int start, int end){    if(start < end) {        int mid = random_partition(arr, start, end);        random_quick_sort(arr, start, mid-1);        random_quick_sort(arr, mid+1, end);    }}void main(){    int A[] = {2,5,7,1,10,8,9};    random_quick_sort(A, 0,6);}

pivot Index must be start + rand() %(end-start+1).. We do a recursive algorithm with different start and end.. :) so be careful about the formula for pivotIndex
We need to add start to the rand() % end because we need to get near the size of the array
start->mid-1 and mid+1->end.. is enough for quick sort.. NOTE: we always get the pivot element in right position at every iteration
There is an alternate way of partitioning given at many other places.. Not a CLRS way!! :) We will see that as well in the next article..!! :D

Wednesday, 9 March 2011

Radix sort–Punch card sorting

“The oldest algorithm ever invented. Simpler and very useful. Used in mechanical devices even. So this is even before computers are invented. Involves pure mathematics and works for any numeral and text systems. The oldest method used to sort dictionaries”

Introduction

This is one of the oldest method and some what easy to understand until you get the point clear. The main idea behind this sort is: if there is array of digits, if you sort array from the least significant to most significant number, you finally get a sorted array.

Similarly, in a dictionary in which you sort the words lexicographically, Most significant character to least significant character sorting is done.. All these are Radix sorts.. It gives a good performance benefit as well, i.e, around O(n).

This sort is not so important. But gives as an idea about why stable sorting is important in certain cases. Radix sort will not work out if the order of the elements is not maintained while putting the elements at the sorted spot.

Concept

Radix sort supports all bases starting from 2.. But the algorithm we wrote and tested is only for base 10 for simplicity. We take each digit and sort the whole array around it.

For example,

{329,457,657,839,436,720,355}

On first sorting around the least significant digit, should yield,

{720,355,436,457,657,329,839}

{720,329,436,839,355,457,657}

{329,355,436,457,657,720,839}

Sort just the digits picked from 1’s place, 10’s place and 100th place. Even though intermediate steps looks like unsorted, You finally get a sorted array out of it. This is achieved just because the order of elements in maintained through out the iterations.

The above example illustrates only an array of integers with equal number of digits. This will also work for integers sizes < 3. The best way to start radix sort is to find out the maximum value and count the digits in it or to decide on the number of max digits.

Usually, in a mechanical punching card where this algorithm is used, they have static 12 digit number imprinted over it and sort starting from LSD to MSD.

Two kinds of radix sort is clearly visible now: LSD radix sort, MSD radix sort. MSD is mostly used for dictionary sort. We will discuss LSD sort in this article.

The simple algorithm is like this,

1. For 0 to number of digits

2. Sort the digits in a stable way in the main array

Code

#define BASE 10 void counting_sort_radix(int*& arr, int k, int size){    int *B, *C;    C = (int*) malloc(sizeof(int)*BASE);    B = (int*) malloc(sizeof(int)*(size+1));     for(int i=0;i<BASE; i++) C[i] = 0;     // only the last digit need to be used     // for sorting    for(int i=0;i<size;i++){        int val = arr[i]/k;        int idx = val%BASE;        C[idx]++;    }     for(int i=1;i<BASE;i++) C[i] += C[i-1];     for(int j=size-1;j>=0;j--) {        int val = arr[j]/k;        int idx = val%BASE;        B[C[idx]] = arr[j];        C[idx]--;    }     free(C);    for(int i=0; i<size; i++) arr[i] = B[i+1];    free(B);} void radix_sort(int* arr, int d, int size){    //d = number of digits    // must calculate d from the max val in array     for(int i=0;i<d; i++) {        counting_sort_radix(arr, pow((float) BASE, (float) i), size);    }} void main(){    int A[] = {329,457,657,89,46,720,355};     radix_sort(A, 3, 7);  }

Important points

We have reused counting sort as the base sorting algorithm for our radix sort algorithm
We go from 1st place to 100th place in above code.. it will run until 3 digits. So, input can only have max 3 digits
BASE is only tested for 10. It can be base 2, base 8(octet), base 16(hex).
We need to read different digit in different iterations. That’s why we have to modify the counting sort and get k=10^d as argument which is used to evaluate arr[j]/k and val%BASE. These two gives you the exact digit at necessary place we are looking for
The 10^d loop decides the place around which counting sort needs to be applied
B[] as usual requires size+1 since we never get a zero digit count. So, we copy from B[1] –> A[0] to balance it.
THE MOST IMPORTANT BUG FIX IN THIS CODE IS STABILITY!!.. The last loop which creates the B[] must run from length to 0 rather than running from 0 to length. If we run from 0 to length, we just get all elements in reverse order instead of forward order.
Why when we run from 0 to length, we get the order of elements in reverse ? Because cumulative sum is taken from front to back, denoting all the other elements go below that index from back to front. So, we must go in reverse to get the correct order
Without this stability condition, we will never get a sorted list from radix sort. Stability of the sorting used to sort digits is very important in radix sort method
Any stable sorting method can be used to sort the digits, not just counting sort.

Monday, 6 June 2011

Reverse in-order traversal of BST

Tuesday, 31 May 2011

Randomized Selection algorithm

Friday, 27 May 2011

Randomized Quick sort

Wednesday, 9 March 2011

Radix sort–Punch card sorting

About the Author