Gist of the Day: Named Capture in Perl Regular Expressions (Briefly)

One of the largest critiques I see about regular expressions is that they lack readability. Well, in Perl 5.10 named capture was added (http://perldoc.perl.org/perlretut.html) which I think adds an awful lot of readability to Perl regular expressions.

The Caveats

  • I’m using UTF-8 in this demo. I am not going to go into all the details of working with UTF-8 since it isn’t the point of this gist.
  • There are a number of ways to capture matches in a regular expression. This is one of them, I’m not going to weigh all of the pros and cons of the different methods (especially since most of it comes down to personal preference).
  • This is going to be just an introduction, a very brief run-through.
  • The demo is intended to simulate a plausibly realistic scenario, not an actual real-life scenario. I was trying to come up with a simple scenario where the benefits of this feature would be apparent. I would agree with most arguments about how this might not be the best or most common approach to this problem, keep in mind that I’m demonstrating a specific feature of regular expressions, not trying to come up with the best way to solve this specific problem.

The Demo


We’re going to focus entirely on lines 22 through 33, that’s where the magic happens. Take special note of how the (?<symbol>.) piece, and other (?<...>...) bits name a match. This syntax takes the match and sticks it into either a grouping of g{name} for backreferences, or $+{name} for captures. That’s what you see when I’m assigning things, you see me pulling from the %+ hash.

Why’s This Useful?

This is useful primarily for reasons of readability and maintainability. If you use the traditional $1 then when your pattern changes and you need to add something else into the beginning, you now have to change your $1 into a $2. If there were more capture variables, you’ll also need to update those as well. Named capture really helps in this situation since you can just name your new capture match and you’re good to go.

Conclusion

I like named capture, I think it’s useful, it’s easy, and it solves some real problems with regular expressions. Let me know what you think, and let me know if you have any other requests for gists.

4 thoughts on “Gist of the Day: Named Capture in Perl Regular Expressions (Briefly)”

  1. Your gist also shows another great thing improving readability of regexes – /x and comments šŸ™‚
    I don’t see point for that “// return” parts in assignments as that block would be run only if there are all matches.
    What is point to initializing keys in $to_return? There are repeated names so maintaining regex would require to also update that map call.
    Changing three assignments to:
    $to_return->{$_} = $+{$_} for keys %+;
    would remove another place with repeated names, so it’s possible to have them only in one place inside get_price_pieces function – in regex.

  2. T, named capture and %+ were the object of the demonstration, so golfing the key assignments wasn’t exactly what I was going for.

  3. That still doesn’t explain why you are unnecessarily creating the full data structure when the first time it is used, it would become a hash ref.
    At most I would have my $to_return = {}; to indicate that it is a hash ref that would be returned.
    Instead of copying the elements of %+ individually
    I would actually use:
    return {%+};
    or even just:
    return %+;
    Which would be used:
    my %usd_pieces = get_price_pieces( $usd );
    or
    my $usd_pieces = { get_price_pieces( $usd ) };

  4. There are two things it looks like you’re talking about:

    1. 1: You seem to take issue with my aversion to auto-vivication. Sorry, this seems a matter of personal preference and well outside the scope of the article.
    2. 2: I am demonstrating how %+ contains the exact same keys as the named capture. I prefer to be more explicit than what you’ve got there, and doing so has worked well for me.

    Keep in mind that in my caveats I did note that I was focused specifically on the named capture, and that I wasn’t trying to take the shortest path to success. I appreciate your enthusiasm in performing a thorough review of my code sample, but it doesn’t seem terribly constructive or relevant.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.