Tuesday, 24 May 2011

Finding the longest substring in a string without repetition

 

Question

  This is an interesting question shared by my friend. Not so tough. But again, we need to remember the use of bitwise. Read more about here

http://analgorithmaday.blogspot.com/2011/04/unique-items-in-array.html

The question is to find the longest substring in a string which doesn’t have repetition.

The algorithm you write must not use any extra space and should perform with O(n).

Example:

If you are given a string like this,

abcdbbnjkuiokklmnuuurstnvwabcd

to find the longest substring without repetition, you should ignore repetitions in each substring. but must still consider the repeated string to be part of the next substring!!

bb (or) abcdb – is a invalid substring

But, bnjkuio – is a valid substring

Solution

char* longest_substring(char* str)
{
    char* istr = str;
    int checker=0;
    int countmax = 0;
    int maxl = 0;
    char* maxidx = str;
    while(*str != '\0')
    {
        if((checker & (1 << *str)))
        {
            if(maxl < countmax) {
                maxl=countmax;
                maxidx= str; 
            }
            checker=0;
            countmax=1; // include count of the duplicate
        }
        else {
            ++countmax;
        }
        checker |= (1 << *str);
        str++;
    }
    if(maxl < countmax) {
        maxl = countmax;
        maxidx = str;
    }
    char* start = maxidx-maxl;
    if(maxl != 0)     start[maxl] = '\0';
    return start;
}
 
void main()
{
    char* test = strdup("abcdbbnjkuiokklmnuuurstnvwabcd");
    char* str = longest_substring(test);
}
  • In the above code, we just keep track of the max length of the currently found substring in maxl. We also maintain the maxidx (pointer address)
  • We just track the max only when you get repetitions in the substring found till now
  • checker bit variable helps us in detecting the duplicate in a substring. We set it back to 0 when we are done with detecting duplicate in the first substring.
  • after everything is done, we just subtract the maxidx with the substring length to get the start of the substring. Just place a NULL character in maxl location to get the perfect longest substring
  • NOTE that, the input string is destroyed!!.. It doesn’t use extra space either for replicating the string argument or for finding duplicates within the string.
  • We do traversal only once the whole string and the 2 condition checks gives O(2n) performance. All other operations would be negligibly constant.
  • If there is a case like,
    • abcdefghijkklmnopqrstu
  • What will be the output of the code? If there are equal sized substrings, the function just gives the first found substring!!