Monday, July 27, 2009

Remove all non-letter characters from a string

Problem: You have a string that may contain various special characters and you need it to contain only letters and digits.
...and you don't want to use regular expressions ;-)

Solution: There are problably a hundred ways to do this, but if you are using C# 3.0 or higher, there is a very quick solution that costs you just one line of code.
First i will show the slightly larger version of the code snippet which explains the workings better:

string dirtyString = "http://www.dottech.nl/#?this%20is%20dirty!";
string cleanString = String.Empty;

// Convert the string to a character array so we can load it into a List
char[] dirtyChars = dirtyString.ToCharArray();

// Create a character list and load the dirtyChars array
List<Char> characters = new List(dirtyChars);

// Use the List's ForEach method to iterate through the characters
characters.ForEach(delegate(char c)
{
if (Char.IsLetter(c))
{
cleanString += c.ToString();
}
});

return cleanString; // "httpwwwdottechnlthisisdirty"

Now, when i wrote this i figured it had to be a little more compact.
So i used a Lambda Expression which is a new feature in C# 3.0 and changed the ForEach line to this:

characters.ForEach(c => cleanString += (Char.IsLetter(c)) ? c.ToString() : String.Empty);

And after some more compacting i ended up with the final snippet:

new List<Char>(dirtyString.ToCharArray()).ForEach(c => cleanString += (Char.IsLetter(c)) ? c.ToString() : String.Empty);

Although it might look complex to someone who sees that line of code for the first time, i still think it's a pretty maintainable line of code as long as you put some comments above it explaining how dirtyString is the input and cleanString the output.

/Ruud

1 comment: