Showing posts with label datastructures. Show all posts
Showing posts with label datastructures. Show all posts

Sunday, 15 September 2013

Remove duplicates in a Linked List

 

Introduction

   Removing duplicates from a linked list is a popular interview question. Although its simple, some people may ask the implement the linked list as such along with this. Yes, you need to add items to the linked list, remove an item from the linked list to do the removeDuplicates function.

Problem

  Move step-by-step.  (The questions to ask before you start)

  1. Is the linked list sorted?
  2. How large is the linked list? Is the size of the linked list fixed?

The above questions are because of the following reasons,

  1. sorted list means duplicates are found in single iteration. (just a single scan and remove)
  2. huge list means – doing a easy sorting would help. Dynamic list means – no sorting, need to try other approaches like heap. If space allowed, you can use hash map

Always go with the brute force way. 2 for loop approach.

Solution

 
class List
{
    int data;
    List* next;
    
public:
    List(int d)
    {
        this->data = d;
        this->next = NULL;
    }
 
    void append(int item)
    {
        List* iter = this;
        
        while(iter->next != NULL) iter = iter->next;
 
        List* newNode = new List(item);
 
        iter->next = newNode;
    }
 
    void deleteNode(List*& node)
    {
        if(this == node)
        {
            List *tmp = node->next;
            node->data = node->next->data;
            node->next = node->next->next;
            delete tmp;
        } else {
            List *iter = this;
            while(iter && iter->next != node) iter = iter->next;
 
            if(iter != NULL) {
                iter->next = node->next;
                delete node;
                node = iter;
            }
        }
    }
    void remove_dup()
    {
        for(List *iter1 = this; iter1 != NULL; iter1 = iter1->next) {
            for(List *iter2 = iter1->next; iter2 != NULL; iter2 = iter2->next) {
                if(iter1->data == iter2->data) {
                    deleteNode(iter2);
                }
            }
        }
    }
 
    void printList()
    {
        List *iter = this;
        while(iter) {
            std::cout<<iter->data<<"->";
            iter = iter->next;
        }
    }
};
 
 
void main()
{
    List obj(10);
    List *ptr = &obj;
    obj.append(2);obj.append(3);obj.append(5);
    obj.append(3); obj.append(6);
    obj.append(2); 
 
    obj.deleteNode(ptr);
    obj.printList();
    obj.remove_dup();
    std::cout<<std::endl;
    obj.printList();
}

Things found while writing code:

  • this pointer is const!! :) which means List *const this; u cannot change what it points to. But u can change the object inside it. const member function has it like, List const *const this;
  • As we delete node using next pointers in the list, to free the pointer you should take a backup of the pointer.
  • Base test cases must be run to catch errors: 2 nodes, 1 node, first node, last node.
  • Reference to a pointer will help in modifying the pointer in a function.
  • The above code is a O(n2) complexity. Brute force. Best way is to do merge sort. – O(n log n) and when doing the merge, remove duplicates.

References

http://analgorithmaday.blogspot.in/2011/01/merge-lists-without-using-extra-space.html

http://analgorithmaday.blogspot.in/2011/01/merge-algorithm.html

http://analgorithmaday.blogspot.in/2011/01/merge-sort.html

Monday, 17 October 2011

Hash tables & Dictionaries

 

Introduction

   Many normal developers simply think that hash tables are something to store a key-value pair. That’s it. What are its applications are not known. Some people directly correlate with C++ STL data structures map and hash map. :)

   hash map is faster than map. But why is that?

We have seen many sorting algorithms already. For searching there are neat data structures available which does our job. What is necessary for a best searching algorithm?

  • lookup should be O(1)
  • insert should also be O(1)
  • delete should also be O(1)

In the sorting case, this is never possible. :) But sorting becomes the basis for searching like binary search. But even that won’t give you O(1).

That’s where we go for the dictionaries. Dictionaries or Hash tables, both mean the same. Don’t confuse!!.. I did that.. :)

Dictionaries are nothing but hash tables which gives you lookup of O(1). Just an another name. :D

Concept

   To explain this simply, take a normal English dictionary. You know the word you need to search for. Just take the dictionary and go directly to the first letter page then continue to scan for the second letter. This means, dictionary is organized in such a way that, your lookup is fast.

   Even hash tables are designed so that your lookup is fast. But you should know the key to look for. For example, Take a normal array. You know that you need to get the 5th element from it. then, you simply do array[5]. This means, your lookup is just O(1).

  Plainly, hash tables are expected to work like arrays. Just give a key, you get a value in O(1) time. But note that, in worst case, in hash tables you get O(n) as well. This happens when, your key doesn’t directly get you the value.

  So, hash tables or linked list both give the same O(n) worst case. But practically hash tables need to be designed so that, the hash or the key used to access the table is unique and can identify clearly a single slot in the table.

  What we are discussing till now, is direct access table. You can always find a value with the key given. If key is not unique, we may need to search a linked list. That’s it.

My understandings

  • Just remember bucket sort. Hash tables are similar to that.
  • Direct access table is impractical. But the best, if the data set to represent is small. Key’s are mostly embedded part of the table and also the data will be part of the table rather than being a list.
  • If the key is not unique, it perfectly matches your bucket sort.
  • Upcoming articles we will try direct access tables. Hash map has more concepts like collisions, using a cyclic buffer, rehashing etc. But without understanding direct access tables, studying those are useless.. :D

Tuesday, 14 June 2011

Counting number of nodes in a binary tree

 

Introduction

   How would you count the number of nodes in a tree? Traverse all the nodes in a tree. You can do any traversal and just start counting. But there is a concept called tail recursion which is handy and very easily readable.

If you have a tree, if you know the number of nodes in the left and number of nodes in right +1, then, what you get is total number of nodes.

Concept

   You have to do tail recursion with count(root->left) + count(root->right)+1.

After you get the count, how you will find the height of the tree ?

  log2 count_of_nodes. But in C++, you won’t get a function to calculate log base 2. :)

There is a way,

  log10 n/log 10 2 = log 2 n

So, you just need to use the above formula to get log base 2 which yields the height of the tree.

Code

Tree* tobj;
 
int count(Tree* root) {
    if(root == NULL)
        return 0;
    else 
        if(root->left == NULL && root->right == NULL)
            return 1;
        else
            return count(root->left) + count(root->right) + 1;    
}
 
cout<<count(tobj);
  • you are covering 2 base conditions here, root == NULL then 0. root->left & root->right is NULL, then only root node is there, which is 1.
  • This expands to 1 for each node in the left & right.

Monday, 13 June 2011

Level order insert in a binary tree

“We have seen binary search trees, heaps.. how would you normally create a dynamic tree from a list. Simply, you are given an array, you need to convert it into a binary tree in level order.”

Introduction

    Level order traversal needs a level-by-level movement in a tree. But if you want to recreate the tree from an array in a level-order way. It should be easy. Just you need to fill the elements with the correct formula as we already discussed.

       2i             -----> root

2i+1     2i+2    --> left      right

This is the basis for the level order insert.

Concept

   If you check the level order traversal, many uses stacks. But we should avoid using stacks. Instead, we will use recursion for our case.

  Understanding recursion requires a simple base condition which needs to be repeated for different argument.

     for example, if you have an array,

     arr = {0,1,2,3,4,5,6,7,8,9}

the tree becomes like this,

                  0

           1                    2

    3           4           5      6

7     8     9

For 0, we need to create 1 and 2. For 1, 3, 4. For 3, create 7,8. Similarly for 4, create 9.

This means, you need to do recursion left with left index and right with right index. The end condition is when left or right index is exceeding the size of the array.

Code

Tree *tobj;
 
void level_order_insert(Tree* root, int* arr, int start, int size)
{
    int left = 2*start+1;
    int right = 2*start+2;
 
    if(left > size || right > size)
        return;
 
    if(root == NULL) {
        tobj = createNode(arr[start]);
        root = tobj;
    }
 
    if(root->left == NULL && root->right == NULL) {
        if(left < size)
            root->left = createNode(arr[left]);
        if(right < size)
            root->right = createNode(arr[right]);
    }
    level_order_insert(root->left, arr, left, size);
    level_order_insert(root->right, arr, right, size);
 
}
 
main() {
   int A[] = {0,1,2,3,4,5,6,7,8,9}
   level_order_insert(tobj, A, 0, 10);
}
  • Before root->left = createNode. You need to make sure that left is still < size. This is to handle the border case
  • root->left gets start as all left with all odd numbers
  • root->right gets start as all right with all even numbers.
  • Most of the times you get partially balanced tree with this method.

Wednesday, 8 June 2011

Infix, Prefix and Postfix notations

“Notations are created for easy processing of expressions. Polish and Reverse Polish Notations are popular ones. Normal notation is human readable.”

Introduction

  There are 3 kinds of expression notations: infix, prefix and postfix.

  • Infix: is human readable. (2+3)*4
  • Prefix: is readable with stack. Expressions first, values next. *+234
  • Postfix: Values first, expressions next.  23+4*

Read more about the prefix notation here: http://en.wikipedia.org/wiki/Polish_notation

Prefix & postfix doesn’t need to use brackets and can be easily represented using stack or tree.

The main advantage is that we don’t need to care much about the precedence as in Infix notation.

Code

/*
Scan the given prefix expression from right to left
for each symbol
 {
  if operand then
   push onto stack
  if operator then
   {
    operand1=pop stack
    operand2=pop stack
    compute operand1 operator operand2
    push result onto stack
   }
 }
return top of stack as result
*/
 
void main()
{
    char* str = "*-5 6 57";
    StackList opStack;
    int d, sum = 0;
    int result=0;
    int count = 0;
    char* rstr = str;
 
    while(*rstr != '\0') rstr++;
    rstr--;
 
    while(rstr > str-1) {
 
        if(isdigit(*rstr)) {
            d = *rstr - 48;
            if(count == 0) sum = d;
            sum += d*count*10;
            count++;
        } else {
            if(sum != 0) {
                opStack.push(sum);
                sum = 0;
                count = 0;
            }
            switch(*rstr)
            {
                case '+':
                    opStack.push(opStack.pop() + opStack.pop());
                    break;
                case '*':
                    opStack.push(opStack.pop() * opStack.pop());
                    break;
                case '-':
                    opStack.push(opStack.pop() - opStack.pop());
                    break;
                case '/':
                    opStack.push(opStack.pop() / opStack.pop());
                    break;
            }
        }
 
        rstr--;
    }
    cout<<opStack.pop();
}
  • We need to come in reverse and push into stack the integers
  • When we see a operator, we need to operate the top 2 elements in the stack.
  • Note that, we use space as the delimiter between 2 digits. For operators, we don’t need that
  • In the stack we will have the only element which is the processed output of the expression