Grep is greedy; make it less so #

While editing an HTML file in TextWrangler, I needed to find and replace all occurrences of:
<span class="a">text</span>
This regex:
\<span class=\"a\"\>.*\<\/span\>
returned results like:
<span class="a">text</span><span class="b">more text</span>
Using a non-greedy quantifier (the question mark) returned the desired results:
\<span class=\"a\"\>.*?\<\/span\>

UPDATE 1: Wikipedia has more on lazy quantification: "... modern regular expression tools allow a quantifier to be specified as lazy (also known as non-greedy, reluctant, minimal, or ungreedy) by putting a question mark after the quantifier (e.g., <.*?>) ..."

UPDATE 2: To match multiline patterns in TextWrangler's grep, start the expression with (?s). According to the BBEdit-TextWrangler Regular Expression Cheat-Sheet m (multiline) "allows the grep engine to match at ^ and $ after and before at \r or \n" and s (magic dot) "allows . to match \r and \n".

/mac | Apr 11, 2010


Subscribe or visit the archives.