## Tuesday 24 May 2011

### Finding the longest substring in a string without repetition

Question

This is an interesting question shared by my friend. Not so tough. But again, we need to remember the use of bitwise. Read more about here

The question is to find the longest substring in a string which doesn’t have repetition.

The algorithm you write must not use any extra space and should perform with O(n).

Example:

If you are given a string like this,

abcdbbnjkuiokklmnuuurstnvwabcd

to find the longest substring without repetition, you should ignore repetitions in each substring. but must still consider the repeated string to be part of the next substring!!

bb (or) abcdb – is a invalid substring

But, bnjkuio – is a valid substring

Solution

`char* longest_substring(char* str)`
`{`
`    char* istr = str;`
`    int checker=0;`
`    int countmax = 0;`
`    int maxl = 0;`
`    char* maxidx = str;`
`    while(*str != '\0')`
`    {`
`        if((checker & (1 << *str)))`
`        {`
`            if(maxl < countmax) {`
`                maxl=countmax;`
`                maxidx= str; `
`            }`
`            checker=0;`
`            countmax=1; // include count of the duplicate`
`        }`
`        else {`
`            ++countmax;`
`        }`
`        checker |= (1 << *str);`
`        str++;`
`    }`
`    if(maxl < countmax) {`
`        maxl = countmax;`
`        maxidx = str;`
`    }`
`    char* start = maxidx-maxl;`
`    if(maxl != 0)     start[maxl] = '\0';`
`    return start;`
`}`
` `
`void main()`
`{`
`    char* test = strdup("abcdbbnjkuiokklmnuuurstnvwabcd");`
`    char* str = longest_substring(test);`
`}`
• In the above code, we just keep track of the max length of the currently found substring in maxl. We also maintain the maxidx (pointer address)
• We just track the max only when you get repetitions in the substring found till now
• checker bit variable helps us in detecting the duplicate in a substring. We set it back to 0 when we are done with detecting duplicate in the first substring.
• after everything is done, we just subtract the maxidx with the substring length to get the start of the substring. Just place a NULL character in maxl location to get the perfect longest substring
• NOTE that, the input string is destroyed!!.. It doesn’t use extra space either for replicating the string argument or for finding duplicates within the string.
• We do traversal only once the whole string and the 2 condition checks gives O(2n) performance. All other operations would be negligibly constant.
• If there is a case like,
• abcdefghijkklmnopqrstu
• What will be the output of the code? If there are equal sized substrings, the function just gives the first found substring!!

Swarup said...

I think the code doesn't work as expected.
If the input string changed to -
"abcdbbnjkuiokswbarchlmnuuurstnvwabcd"

then the code returns output -
"kswbarchlmnu"

which is not correct, the correct output should be -
"uiokswbarchlmn"

The reason is - the code ignores the whole string parsed so far and starts parsing with the next position whenever it finds a duplicate character.

vprajan said...

Sorry for checking this very late..

Yeah.. its a blunder in this code.. just now realized it.. :(

will fix this ASAP. thanks for making me aware of this.

Kamal said...

A good working code is at http://www.ritambhara.in/longest-substring-of-unique-characters/