## Saturday, 1 January 2011

### Merge algorithm

“Wishing the blog readers a Very Happy New Year 2011 and also the start of a new Decade” Merge operation

Think about two pile of cards opened above and consider both the piles are sorted. Now, you need to take cards from two pile and sort it to one. How would you do that?

* Take least card from the two decks, it opens up the next card in one deck.

* Put the card selected facing down, this is sorted list

* Continue selecting least cards visible on the top of the two decks, and place it facing down. Each time you do it, you get a sorted deck face down in correct order (unlike insertion sort, where u insert at particular point by shifting)

Since we just check the top 2 cards here, number of steps to get sorted deck = number of cards in both the piles

Merge 2 Sorted list

The merge operation explained above is the basic unit of the merge sort. We took the example of 2 sorted pile of cards, consider pile A and pile B. That is, we take the least card facing across pile A and pile B, and create a new pile C. This continues until we get merged pile C. At each phase, pile C has sorted list across pile A and pile B. so, we can stop the iteration any time, to get a partially sorted list across sorted piles.

For every algorithm, we usually see the loop invariant as explained above to understand what doesn’t change in each iteration

The invariant in merge algorithm is the sorted list getting generated from 2 already sorted list. And also the another truth is that, always the Left list first element and the Right list first element are the smallest elements in the respective lists which is not copied into the sorted list yet. So always the main array contains sorted elements from the two lists at any iteration stage.

Important points

* If there are only 2 numbers to be merged, always the least number goes to the common list first. That is consider, A= {5}, B = { 2} are the two lists to be merged into C = {}. When merging, This the basic step,

1. A = {5}, B = {}, C={2}
2. A={}, B={}, C = {2, 5}

Some bigger example, A = { 1, 3}, B = {2, 6}, C= {}

1. A={3}, B ={2,6}, C={1}
2. A={3}, B ={6}, C={1,2}
3. A={}, B={6}, C={1,2,3}
4. A={}, B={}, C={1,2,3,6}

* The above example proves that, Merge algorithm itself is a kind of sorting but the only truth assumed before merging is that, the lists considered are already sorted. THIS IS VERY IMPORTANT POINT FOR A MERGE ALGORITHM.

* For the algorithm, we give the start, mid and end index with just a single array. That is A and B lists explained above is considered a single list and its divided into two by the algorithm based on the start, mid and end index. These indexes should mark a perfectly sorted list for the merging to work. THIS IS ANOTHER IMPORTANT POINT.

Example: D= { 3, 4, 1, 2, 6}

Then, for merging this D, this is call to merge algorithm, Merge(D, 0, 1,4)

start = 0, mid = 1, end =4

So, 2 lists, A= {3, 4}, B={1,2,6} is created and this is merged to C = { 1, 2, 3, 4, 6}

* From the above example, its clear that, merging is a kind of sorting. If you know the index where sorted order goes out of order, then you can simply apply merge operation to sort the list. If you know all the indexes where the sorted order is disobeyed, you can just apply merge to sort the whole list. This is what is done as part of merge sort algorithm which will be discussed soon.

* The complexity in designing this algorithm is finding the end of the list and faster appending approach. There are multiple ways like using a end bit or just tracking the length of the list or using different data structures other than arrays.

* All the sorting and searching is explained with arrays currently. There is something computer specific in case of algorithms which is called the data structures. (In a cooking perspective, data is the ingredients and data structure is study about how much of the ingredients can mix up to give correct blend). We will use arrays for now so that, we can concentrate only on the algorithms rather than studying the equally complex data structures. Usage of data structures comes with experience. Like in cooking, how we decide about, how much salt, how much sugar kind of, which is the speciality of a cook. Code:

Please note that, the below code will be the base code for one more algorithm called merge sort. Collection of merge operation done with selected indices produces merge sort.

`   1:  void merge(int *arr, int start, int mid, int end)`
`   2:  {`
`   3:      if ( mid==0 || end==0)`
`   4:      {`
`   5:          // error`
`   6:          return;`
`   7:      }`
`   8:   `
`   9:      // arr = List D = split into List A and List B`
`  10:      int n1, n2;`
`  11:      n1 = mid - start + 1; // List A[0..n1-1]`
`  12:      n2 = end - mid; // List B[0..n2-1]`
`  13:   `
`  14:      int *la, *lb;`
`  15:      int i,j,k;`
`  16:      la = (int*) malloc(n1); `
`  17:      lb = (int*) malloc(n2);`
`  18:   `
`  19:      for(i=0; i<n1; i++)`
`  20:      {`
`  21:          if(start == 0)`
`  22:              la[i] = arr[i];`
`  23:          else`
`  24:              la[i] = arr[start+i];`
`  25:      }`
`  26:   `
`  27:      for(j=0; j<n2; j++)`
`  28:          lb[j] = arr[mid+j+1];`
`  29:      `
`  30:      la[n1] = INT_MAX;`
`  31:      lb[n2] = INT_MAX;`
`  32:   `
`  33:      i = 0; j =0;`
`  34:      for(k = start; k<end+1; k++)`
`  35:      {`
`  36:          if(la[i] < lb[j])`
`  37:          {`
`  38:              arr[k] = la[i];`
`  39:              i = i + 1;`
`  40:          }`
`  41:          else`
`  42:          {`
`  43:              arr[k] = lb[j];`
`  44:              j = j + 1;`
`  45:          }`
`  46:      }`
`  47:      free(la);`
`  48:      free(lb);`
`  49:  }`
`  50:   `
`  51:  void main()`
`  52:   `
`  53:  {`
`  54:   `
`  55:      int test_arr[] = {3,5,2,6,8,9,0,1};`
`  56:   `
`  57:      merge(test_arr, 2,5,7); // first merge sublist output: {3,5,0,1,2,6,8,9}`
`  58:   `
`  59:      merge(test_arr, 0, 1, 7); // merge full list`
`  60:   `
`  61:   }`

indices used in the above code is with array index starting with 0 and ending with length –1. The End of the array is marked with INT_MAX which is the largest number in the integer family.

Error condition like selecting zero size for mid and end is covered in the above code. Selecting mid or end as zero means, there is no need for a merge operation itself !!. Mark this as an important algorithm to remember for the interviews that’s why i made this entry at the start of the year!! 