Regex style in Ruby
Reading Patrick McKenzie’s excellent practical example of metaprogramming, I came across a line of code I didn’t understand:
That line taught me three new things about Ruby:
- The syntax for the subscript operator
[]
allows multiple arguments. (It turns out I already knew this in another context:[1,1,2,3,5,7][2,3] => [2,3,5]
) - You can subscript a String with a Regexp, returning the first match:
"goal"[/[aeiou]/] => "o"
(nil
is returned if there is no match). - If you throw in an index
n
, then you get then
th capturing group of the first match:"xaabb"[/(.)\1/, 1] => "a"
(ornil
again if no match).
That last one is interesting, because it means there’s a concise way I didn’t previously know about to achieve a common regex task: checking if an input string matches a given format, and if so, extracting part of the format. Say we want to pull out the domain from an email address, but complain if we can’t find it:
Before learning this trick I would have either used a temporary match object a la Java, or gritted my teeth and used a global variable Perl-style:
Both of those seem rather verbose. They can be golfed into one-liners, but the readability starts to suffer:
So I’m left wondering what’s the most readable and/or idiomatic style for regexes in Ruby. TMTOWTDI indeed! Even now I know what it means, "xaabb"[/(.)\1/,1]
makes me double-take slightly - it’s an unusual way to use []
- but I guess it’s just another Ruby idiosyncracy I’ll come to know and love.
Comments (archived)
Comments are disabled, but please feel free to reach out on social media if you'd like to discuss this post!
Comments from a previous URL for this post (which now redirects here):
Comments from a previous URL for this post (which now redirects here):
-
Using regexes in the subscript operator is awesome! Even I was confused when I saw that the first time -- one look at the ruby reference and I was awe struck...
–
at Mon, 30 Nov 2009 15:17:00 +0000
Comments from a previous URL for this post (which now redirects here):
-
What about something like: http://gist.github.com/245613
–
at Mon, 30 Nov 2009 18:24:35 +0000 -
Wow, with the subscript operator[] you really get good readability :) Though, I first had to think about, how this operator even works, but if you understand it, it is a real nice idea!
–
at Mon, 30 Nov 2009 19:27:25 +0000 -
This looks pretty easy:
"foo@example.com" =~ /@(.*)/ and $1 or raise "bad email"
–
at Mon, 30 Nov 2009 20:38:18 +0000 -
# Perl - without using global variables, to save you from gritting your teeth :)
if (my ($domain) = 'foo@example.com' =~ /@(.*)/) {
# do something with $domain
}
else {
# do whatever when domain is not found
}And, $domain is a lexical variable, whose scope doesn't extend beyond the if statement.
–
at Mon, 30 Nov 2009 22:10:27 +0000 -
I forgot to add in my previous comment:
Accessing regex matches using the subscript operator syntax is very cool, I must say.
–
at Mon, 30 Nov 2009 22:13:12 +0000 -
Wow, this is neat -- although I find that shorter is not always more readable. Personally it took me a lot longer to mentally parse the subscript version, but maybe that's just lack of familiarity.
Another thing this teaches us is that you've found a case where GitHub's syntax highlighting fails :)
–
at Tue, 01 Dec 2009 10:35:31 +0000 -
Rather than the 'subscript operator' I think its more accurate to say that you're calling the [] method on the string object.
–
at Tue, 01 Dec 2009 13:20:23 +0000
Perl returns the results of a match in list context:
– JadeNB at Mon, 30 Nov 2009 14:55:43 +0000