Monday, 14 March 2011

String of String search

 

Introduction

    You are given a simple question to implement strstr C API.  You should not use extra space, no extra library APIs and performance must be as near to O(n). How would you do that?

Note that O(n) is a must.. :)

How I did this?

   I started a normal approach of comparing each character with the string and came with an algorithm which is O(n2). This is a brute force approach to pattern matching of string

int str_str(char* sstr, char* needle)
{
    char* iter = 0;
    char* stmp = 0;
    char* initial = sstr;
    while(*sstr != '\0') {
        iter = needle;
        stmp = sstr;
        while(*iter != '\0') {
            if(*stmp != *iter) {
                stmp++;
                break;
            }
            else
            {
                iter++;    
                stmp++;
            }
        }
        if(iter-needle == stmp-sstr)
            break;
        sstr++;
    }
    if(*sstr == '\0')
        return 0;
    else
        return sstr-initial;
}
 
void main()
{
    char* search_string="aabaaabbbbbbaaaaabbabaaaaaaaaaa";
    char* str = "aaa";
    std::cout<<str_str(search_string, str);
 
 
}

Important points:

  • sstr, needle is not const char* here unlike the C API
  • we need copy of initial sstr to calculate the index at the last.
  • sstr itself is used as a counter to move inside the string. When some match of iter happens, we continue the loop, if not, we break.
  • we need to reset the iter and current sstr to start the comparisons fresh
  • iter runs for the needle and checks in stmp range, if any initial condition fails, we return
  • stmp++ is required in a failure case for our calculation of equality check in outer loop
  • If sstr is \0, that means we reached end of string and didn’t find the match yet. So we return 0.
  • Note that, even if the string is found at first index, the above code returns 0. :). So, you must take care of the return value
  • The above algorithm is fully based on pointer arithmetic and a bit hacky!! If you are well versed in pointers, you can come with the above solution.
  • The above code works. But gives O(n2). So, this is not the right algorithm interviewer is expecting. :). But if you come up at least with this, you can think about optimizing it after studying further algorithms.
  • The optimization mainly happens in the inner loop. The iter run.. There are many algorithms for pattern matching (from Goodrich)
    • Boyer-Moore Algorithm
    • Knuth-Morris-Pratt
  • There are many text processing algorithms which we will look into soon!!