Showing posts with label insertion. Show all posts
Showing posts with label insertion. Show all posts

Thursday, 17 February 2011

Merge sort with k way insertion sort

“I believe no one forgot merge sort!! The first place where we saw about trees and divide-conquer algorithms. Recollect all that, to understand the new method of sorting again using divide and conquer to save space”

See Also

http://analgorithmaday.blogspot.com/2011/01/merge-lists-without-using-extra-space.html

http://analgorithmaday.blogspot.com/search/label/merge

Metaphor

   What if you want to sort a list of sorted list using just a merge? Merge is really important in real world computer systems.. You merge code, you merge songs, you mix videos, you do editing. Its like you a list of sorted things, you need merge all of them and create a single sorted list.

  So, there are two things here you need, you need initial sorted things. How you sort them? you can use any sorting algorithm for this. There is no enforcement over this. So, merge itself becomes a sorting algorithm when you provide a sorted list to it. But what does our merge example given already does? It merges 2 sorted arrays into a single sorted array. Think about an external sorting scenario where you have 9 sorted list. Then you need to merge 9 ways. :)

  Even in code comparison tool, you have 3 way merge where you compare with 3 versions of code.. Even this merge is similar to what we do in a 3-way merge.

Concept

  The main concept here is, prepare a list of sorted arrays. You can create sorted arrays using any method. We will select insertion sort for now. quick sort can be studied next as it is a big topic!!.. :) After you sort the arrays, you get say k sorted arrays. How you merge all of em? As explained in the external sort, take the initial element in each array and do a merge into the sorted list.

The number of elements to take from each array should be configurable. There are many more optimization possible in a k-way merge sort. Using quicksort reduces the time taken, organizing the memory in RAM in the form of a tree reduces comparisons (priority queue usage). But it is expected that, the algorithm takes lots of CPU and much lesser RAM.

Code

void insertion_sort(int *arr, int n)
{
    for(int j=1; j<n;j++)
    {
        int key = arr[j];
        int i=j-1;
        while(i>=0) {
            if(arr[i] > key)
            {
                arr[i+1] = arr[i];
                i = i - 1;
            }
            else
                break;
        }
        arr[i+1] = key;
    }
}
 
// old way 2 way merge
void merge(int *arr, int start, int mid, int end)
{
    // arr = List D = split into List A and List B
    int n1, n2;
 
    n1 = mid - start + 1; // List A[0..n1-1]
    n2 = end - mid; // List B[0..n2-1]
 
    int *la, *lb;
    int i,j,k;
    la = (int*) malloc(sizeof(int)*(n1+1)); 
    lb = (int*) malloc(sizeof(int)*(n2+1));
 
    for(i=0; i<n1; i++)
    {
        if(start == 0)
            la[i] = arr[i];
        else
            la[i] = arr[start+i];
    }
 
    for(j=0; j<n2; j++)
        lb[j] = arr[mid+j+1];
 
    la[n1] = INT_MAX;
    lb[n2] = INT_MAX;
 
    i = 0; j =0;
    for(k = start; k<end+1; k++)
    {
        if(la[i] < lb[j])
        {
            arr[k] = la[i];
            i = i + 1;
        }
        else
        {
            arr[k] = lb[j];
            j = j + 1;
        }
    }
    free(la);
    free(lb);
}
 
const int SPLITS = 20;
 
void kmerge(FILE* out_fp,int segsize, int buf_size)
{
    // SPLITS * buf_size = RAM usage of this method
    /*
    1. read buf_size from each splits into a array of size SPLITS*buf_size
    2. perform the above 2-way merge in-place
    3. write the array inside out_fp
    */
    int* merge_arr = (int*)malloc(sizeof(int)*SPLITS*buf_size);
    int buf_offset=0;
    FILE *tmpfile[SPLITS];
    for(int k=0; k<SPLITS;k++) {
        char* filename = (char*) malloc(105);
        char* number=(char*) malloc(100);
        strcpy_s(filename, 105, "tmp");
        _itoa_s(k, number, 100, 10);
        strcat_s(filename, 105, number);
        errno_t e2 = fopen_s(&tmpfile[k], filename, "r");
    }
 
    while(buf_offset < segsize) {
        int j=0;
 
        // read buf_size from SPLITS files and 
        // put it inside merge_arr
        for(int k=0; k<SPLITS;k++) {
            for(int i=0; i<buf_size; i++) {
                char* buffer= (char*)malloc(100);
                if(!fgets(buffer, 100, tmpfile[k])) 
                    break;
                merge_arr[j]= atoi(buffer);
                j++;
                free(buffer);
            }
        }
 
        // merge the array
        merge(merge_arr, 0, SPLITS*buf_size/2,SPLITS*buf_size);
 
 
        // write into output file
        char* buffer=(char*) malloc(100);
        for(int k =0; k < j; k++) {
            _itoa_s(merge_arr[k], buffer, 100, 10);
            buffer[strlen(buffer)+1] = '\0';
            buffer[strlen(buffer)] = '\n';
            fputs(buffer, out_fp);
        }
        buf_offset+=buf_size;
    }
    for(int k=0; k<SPLITS;k++) {
        fclose(&tmpfile[k]);
    }
 
}
 
void external_sort(FILE* fp, int lineno, FILE* out_file)
{
    int l = (int) ceil((float)lineno/(float)SPLITS);
    int k=0;
    FILE* midfile;
    while(!feof(fp)) {
        int i = 0;
        char* buffer = (char*) malloc(100);
        int* p = (int*) malloc(sizeof(int)*l);
        while(i < l) {
            if(!fgets(buffer, 100, fp)) 
                break;
            p[i]= atoi(buffer);
            i++;
        }
        free(buffer);
        insertion_sort(p, i);
        char* filename = (char*) malloc(105);
        char* number=(char*) malloc(100);
        strcpy_s(filename, 105, "tmp");
        _itoa_s(k, number, 100, 10);
        strcat_s(filename, 105, number);
        errno_t e2 = fopen_s(&midfile, filename, "w");
        buffer=(char*) malloc(100);
        for(int j =0; j < i; j++) {
            _itoa_s(p[j], buffer, 100, 10);
            buffer[strlen(buffer)+1] = '\0';
            buffer[strlen(buffer)] = '\n';
            fputs(buffer, midfile);
        }
        free(buffer);
        free(p);
        buffer = 0;
        p=0;
        fclose(midfile);
        k++;
    }
    kmerge(out_file, l, 500);
}
 
void main()
{
    //519670
    FILE *in, *out;
    errno_t e1 = fopen_s(&in, "sample.txt", "r");
    errno_t e2 = fopen_s(&out, "out_sorted.txt", "w");
    external_sort(in, 519670, out)
    fclose(out);
    fclose(in);
}

The above code reused insertion_sort and merge routine which is already written. Some terminology in code,

SPLITS = number of splits to be done over the big file

seg_size = number of lines in the file/SPLITS

kmerge buf_size = the size to read from each sorted file

This is the diagrammatic representation,

sample.txt 

                    seg_size       seg_size*2….

|--------------------|-----------------|---------------------------------|

|           tmp0       |    tmp1          |    tmp2          | ….

|------------------------------------------------------------------------|

tmp0, tmp1, tmp2..unto tmp[SPLITS] files contains sorted parts of sample.txt.

All tmp files are merged into out_sorted.txt and a single huge sorted list is created.

While merging this is diagrammatic representation,

tmp0                  seg_size         

| -----|---------------|                  

|       |                   |

| -----|---------------|

       buf_size(500)

From all files, tmp0..tmpN, buf_size data is read into an array (RAM) and then merge operation is done over it. This programs RAM usage is not profiled correctly!! but it will be lesser.

Important points

  • You cannot use a single file to do sorting + merging. You need to split into different files and sort it
  • The files created are not deleted. just for the reference. Line number in the file is hardcoded.
  • insertion sort is used which is O(n2).. Should use better sorting based on the number of elements. Heap sort,quick sort are better alternatives.
  • NOT AT ALL TESTED!!!… USE AT UR OWN RISK!!..
  • BETER USE K-WAY SORT FROM THE BELOW LINKS RATHER THAN MY IMPLEMENTATION

Reference & must to use

http://code.google.com/p/kway/

http://cis.stvincent.edu/html/tutorials/swd/extsort/extsort.html

Wednesday, 12 January 2011

Insertion sort–Recursive Method

Replacing loops with recursive requires functional knowledge of the code. In Insertion sort, we assume the first number as sorted and take the next subsequent number, compare with first number and move or swap it depending on the data structure.

Metaphor

     Another metaphor for insertion sort will be arranging the books based on their sizes. When you find a medium sized book, you insert it in the middle and reorder the full list to accommodate the middle book being inserted. This is what is done in a array kind of data structure. Even in linked list based insertion sort, we need to traverse the full sorted list to find the insertion point. But linked list way is easily visualized.

Concept

   Both iterative or recursive gives same performance. Recursive procedure blindly converts all the loops into recursive functions. Note that, in this recursive method, we will see a double-level recursion which is necessary in case of loop within a loop with different looping conditions.

Code

Some points:

  • In recursive way, we take first( 0 )–> last (size –1) and take size-1 when it reaches condition first<last, that is last becomes 1 finally. In case of iteration, we do like 1 –> list size and assume arr[0] as sorted list. This means, recursive algorithm goes in a reverse way while normal loops go in a forward way.
  • In the iterative version, there is a inner loop which shifts the elements to get a whole to insert the element in the sorted list. In recursive version, this is taken care by insertInOrder function.
  • Note that, insertInOrder function has 3 operations
    • Check whether the element is less than the last element in the sorted list. Then just append the element to the sorted list by doing,  a[last+1] =element
    • if on a recursion to shift elements, reaches the end (that is, first < last). Then start returning. Note that, if element >= a[last] will occur on every call and only when it becomes false, recursion comes up. In each recursion, there are in turn 2 checks
      • element > a[last]
      • element < a[last]
    • When the element<a[last] in a recursive function return, we will have the correct index in the last variable. We just put the element at that position and shift. Note that, shifts inside the array happens always except when element >= a[last]!!
   1:  void insertInOrder( int element,int *a, int first, int last)
   2:  {
   3:      if (element >= a[last])
   4:          a[last+1] = element;
   5:      else if (first < last)
   6:      {
   7:          a[last+1] = a[last];
   8:          insertInOrder(element, a, first, last-1);
   9:      }
  10:      else // first == last and element < a[last]
  11:      {
  12:          a[last+1] = a[last];
  13:          a[last] = element;
  14:      }
  15:  }
  16:   
  17:   
  18:  void insertion_sort_recur(int *arr, int first, int last)
  19:  {
  20:      if(first < last)
  21:      {
  22:          insertion_sort_recur(arr, first, last-1); // avoids looping thru arr[0..last-1]
  23:          insertInOrder(arr[last], arr, first, last-1); // considers arr[last] as the first element in the unsorted list
  24:      }
  25:  }
  26:   
  27:  void main()
  28:  {
  29:      int A[]={5,3,2,4,6,1};
  30:      insertion_sort_recur(A,0,5);
  31:  }

 

Important points

  • By seeing the above code, it looks clear that, there is direct conversion of iterative procedure to a recursive one. But note that, when you make some function recursive, there should be a recursion end condition along with other conditions.
  • If the iterative version of the algorithm is nested, then the recursive usually looks like the above code.
  • The above code is taken from: http://www.cs.siu.edu/~mengxia/Courses%20PPT/220/Chapter11.ppt
  • This is an exercise problem in “Introduction to Algorithm, Corman-Lewis” book.

Friday, 7 January 2011

Insertion sort with Linked List–in-place

 

Introduction

Yesterday we saw about insertion sort using Linked list. Linked list are really interesting, if you do things the correct way. Let me explain about insertion sort using linked list in-place version which is available in most of the C++ standard libraries nowadays.

My version may be little bit complex. Please bear with me with the number of temporaries used. It could be minimized a lot!!.

Let us directly dive into the code.

Code

  • head is the start of a dynamic linked list. We assume head has sorted list initially
  • unsort is the start of the unsorted linked list any time in the whole code
  • prev, iter is used to mark the position to insert the element in the sorted list under head.
  • when iter and key comes to a same position, key lies in the end of the sorted list. So, we continue the loop
  • We move iter each time to the end of the sorted list and connect it with the unsorted list. This is a very critical step and can be optimized one.

 

   1:   
   2:  List* insertion_sort_inplace(List *head)
   3:  {
   4:      if(head == 0) return head;
   5:   
   6:      // assume head is sorted
   7:      List *unsort = head->next;
   8:      while(unsort != 0)
   9:      {
  10:          // take key as an element in the unsorted list.
  11:          List *prev = 0;
  12:          List *iter = head;
  13:          List *key = unsort;
  14:   
  15:          // iterate within the sorted list and find the position
  16:          while(iter != 0)
  17:          {
  18:              if(iter->data < key->data)
  19:              {
  20:                  prev = iter;
  21:                  iter = iter->next;
  22:              }
  23:              else
  24:                  break;
  25:          }
  26:          unsort = unsort->next;
  27:          // if reached the end of sorted list, continue
  28:          if(iter == key) 
  29:              continue;
  30:   
  31:          // note down the position to place
  32:          List *place = iter;
  33:   
  34:          //move iter to end of the sorted list and connect with unsort
  35:          while(iter->next != key) iter=iter->next;
  36:          iter->next = unsort;
  37:   
  38:          // delete the key and place it in sorted list
  39:          if(prev == 0) {
  40:              head = key;
  41:          } else {
  42:              prev->next = key;
  43:          }
  44:          key->next = place;
  45:   
  46:      }
  47:      return head;
  48:  }
  49:   
  50:  void main()
  51:  {
  52:      List *head = createNode(5);
  53:      append(head, createNode(2));
  54:      append(head, createNode(7));
  55:      append(head, createNode(6));
  56:      append(head, createNode(4));
  57:      append(head, createNode(2));
  58:      append(head, createNode(10));
  59:      append(head, createNode(1));
  60:      append(head, createNode(3));
  61:   
  62:      List *sorted = insertion_sort_inplace(head);
  63:  }

 

Important points

  • Note that the above algorithm doesn’t require any size input. It can process any large infinite numbers which are dynamically created.
  • The important concept on computers is that, when you create something dynamically, the memory is created on the heap which is on RAM. So, this in-place version requires huge RAM space when the list is huge.
  • Again and Again, Linked list cannot be accessed based on index. Address stored in temporaries plays important role. If you need to clean up temporaries, Recursion is the solution which we are afraid of currently.
  • Another important demerit in the above code is, we traverse again to reach the end of the sorted list. But still it equates to O(n2).