1

Closed

ExtractDigits

description

Browsing through the code, I noticed that I could improve ExtractDigits. Below are two different improved version.
 
ExtractDigits3 is fastest overall, but ExtractDigits2 is easier to read. Both are significantly faster than the current version.
Normalized timings:
ExtractDigits (current) 6.27
ExtractDigits2 1.31
ExtractDigits3 1.00
 
public static string ExtractDigits2(this string value)
{
return string.Join(null, value.Where(Char.IsDigit));
}
public static string ExtractDigits3(this string value)
{
return value.Where(Char.IsDigit).Aggregate(new StringBuilder(value.Length), (sb, c) => sb.Append(c)).ToString();
}
Closed Oct 19, 2013 at 10:30 AM by cgessler
Re-opened by mistake

comments

PatrickLorenz wrote Nov 7, 2011 at 11:14 AM

Hi James,

the feedback is appriated. Are you willing to go ahead and update it directly in the code base?

Regards,

Patrick

JamesCurran wrote Nov 8, 2011 at 3:55 PM

If you're offering to give me developer source access, sure ! (I live to refactor!)

PatrickLorenz wrote Nov 8, 2011 at 4:30 PM

That's no problem at all. You just have to apply for it (see the "People" tab) and I will approve it.

nathankoop wrote Dec 16, 2011 at 2:35 PM

I believe this should be in fixed or closed status

JamesCurran wrote Dec 23, 2011 at 2:03 PM

Fixed in Change Set 66651

cgessler wrote Nov 28, 2012 at 1:15 PM

James, I'm curious why you would use Linq to replace a simple foreach loop. After running some tests, I found that the following runs in roughly 1/2 the time as ExtractDigits3.
    public static string ExtractDigits(this string @this)
    {
        StringBuilder sb = new StringBuilder();
        Char c;
        for (int i = 0; i < @this.Length; i++)
        {
            c = @this[i];
            if (Char.IsDigit(c))
                sb.Append(c);
        }
        return sb.ToString();
    }

JamesCurran wrote Dec 3, 2012 at 11:23 PM

@cgessler

Technically I wasn't "replacing" a loop with the lambda syntax, since the original used the lambda form as well.

I also was expecting the loop form to be that much of a difference. (In my tests, your code takes about 2/3rd the time of mine, but let's not quibble over 17% -- it's still more than I expected).

In theory, the lambda/functional form is "better" as it expresses "what" I want to do, instead of "how" I want to do it. This goal would probably be better served if I had split that into two methods:

public static string BuildStringCharByChar(this IEnumerable<char> chars)
{
return chars.Aggregate(new StringBuilder(), (sb, c) => sb.Append(c)).ToString();
}
public static string ExtractDigits(this string value)
{
return value.Where(Char.IsDigit).BuildStringCharByChar();
}

Here, the very definition of the ExtractDigits method spells out what it does --- you just say "This is what I want it to do", and you leave it to someone else to optimize those commands. If build strings character by character was used enough that it became a bottleneck in the application, then it could be manually optimized (in, say, hand-code IL if necessary), and then everyone using BuildStringCharByChar would gain the benefits. On the other hand, optimizing the loop version of ExtractDigit only helps that one method. Further, for() loop (which have already been hand optimized as much as they can), are limited to what further they can do be optimized as they must work in many different contexts. Optimizations to BuildStringCharByChar could take advantage of it's very specific scope.

However, the THEORY kinda depends of the lambda version and the loop version being closer in time.

In practice, the lambda version needs to use the IEnumerable interface, while the loop version can use the (faster) array interface.

(nit-pick time: "LINQ" stands for "Language integrated query", so unless a statement using the LINQ keywords (from...in...where...select etc), it's technically not "LINQ". The style using the chained extension methods is usually call the "lambda" style or the "functional" style.

Bottom line: It's an improvement -- feel free to change it to your method.

cgessler wrote Dec 8, 2012 at 2:45 PM

Thanks for the explanation. I thought because the .Where extension method lived in the System.Linq namespace that it was considered to be "LINQ". In the past, I have found Linq extensions to be much slower than for/foreach loops which is why I don't use them much, however, I do use Lambda expressions a lot as many newer .NET framework methods are taking advantage of Func<> and Action<>. I'm a bit suprised that the for loop only shaved off 34% though because in the past I've seen improvements well into the thousands.

I doubt that your Linq/Lambda version would ever be a bottleneck by itself, but combined with other Linq type methods and other tiny slow downs, they can (and do) start to add up - especially if some of these methods were to be used inside loops. All of these tiny little improvements could add up to significient performance gains on a larger scale, say in a web app with hundreds of users. I do have to say - your method is quite clever and much faster than I expected - so I'll have to remember this one when a for loop doesn't quite cut it.

wrote Feb 21, 2013 at 11:19 PM

wrote May 16, 2013 at 10:51 AM

wrote May 16, 2013 at 10:51 AM

wrote Jun 14, 2013 at 7:26 AM

wrote Oct 19, 2013 at 10:30 AM