Sunday, September 6, 2009

String tokenizer by hand

I was thinking of implementing my custom "StringTokenizer" and here it is. The complexity is O(n) which means it depends on the size of the string.

public static ArrayList tokenizer(String input){
ArrayList tokenizedStr = new ArrayList();
StringBuffer str = new StringBuffer();
int strLength = input.length();
int index = 0;
while (index < strLength) {
char curr = input.charAt(index);
if (curr != ' '){
str.append(curr);
} else {
tokenizedStr.add(str);
str = new StringBuffer();
}
if ((index == strLength-1 ) && (curr != ' ')) {
tokenizedStr.add(str);
}
index++;
}
return tokenizedStr;
}
The most interesting part of this method if the condition to check if we have reached the last character of the string and to make sure if it is also not a whitespace character.

UPDATE:
On a second thought, the algorithm in this method is not perfect because it will still retain inline whitespace characters. This is a much better way to tokenize the strings

while (index < strLength) {
char curr = input.charAt(index);
if (curr != ' '){
str.append(curr);
} else {
if (str.length() != 0) {
tokenizedStr.add(str);
}
str = new StringBuffer();
}
if ((index == strLength-1 ) && (curr != ' ')) {
tokenizedStr.add(str);
}
index++;
}
An example of solving this in recursion is at "Tokenizing String Recursively".

0 comments:

Post a Comment