1. Capturing matched strings to scalars
Parentheses are numbered from left to right by the opening parenthesis. The following example should help make this clear:
$_ = "fish";
/((\w)(\w))/; # captures as follows:
# $1 = "fi", $2 = "f", $3 = "i"
$_ = "1234567890";
/(\d)+/; # matches each digit and then stores the last digit
# matched into $1
/(\d+)/; # captures all of 1234567890
Evaluating a regular expression in list context is another way to capture information, with parenthesised sub-expressions being returned as a list. We can use this instead of numbered variables if we like:
$_ = "Our server is training.perltraining.com.au.";
my ($full, $host, $domain) = /(([\w-]+)\.([\w.-]+))/;
print "$1\n"; # prints "training.perltraining.com.au."
print "$full\n"; # prints "training.perltraining.com.au."
print "$2 : $3\n"; # prints "training : perltraining.com.au."
print "$host : $domain\n" # prints "training : perltraining.com.au."
2. Greediness
Regular expressions are, by default, "greedy". This means that any regular expression, for instance .*, will try to match the biggest thing it possibly can. Greediness is sometimes referred to as "maximal matching".
Greediness is also left to right. Each section in the regular expression will be as greedy as it can while still allowing the whole regular expression to match if possible. For example,
$_ = "The cat sat on the mat";
/(c.*t)(.*)(m.*t)/;
print $1; # prints "cat sat on t"
print $2; # prints "he "
print $3; # prints "mat";
It is possible in this example for another set of matches to occur. The first expression c.*t could have matched cat leaving sat on the to be matched by the second expression .*. However, to do that, we need to stop c.*t from being so greedy.
To make a regular expression quantifier not greedy, follow it with a question mark. For example .*?. This is sometimes referred to as "minimal matching".
$_ = "The fox is in the box.";
/(f.*x)/; # greedy -- $1 = "fox is in the box"
/(f.*?x)/; # not greedy -- $1 = "fox"
$_ = "abracadabra";
/(a.*a)/ # greedy -- $1 = "abracadabra"
/(a.*?a)/ # not greedy -- $1 = "abra"
/(a.*?a)(.*a)/ # first is not greedy -- $1 = "abra"
# second is greedy -- $2 = "cadabra"
/(a.*a)(.*?a)/ # first is greedy -- $1 = "abracada"
# second is not greedy -- $2 = "bra"
/(a.*?a)(.*?a)/ # first is not greedy -- $1 = "abra"
# second is not greedy -- $2 = "ca"
Perl高级正则表达式
标签
脚本语言
相关文章:
相关文章:
- 安装DBD:SQLite时的问题 - 2011-05-24
- Notes for running CGI scripts on Windows Apache Server - 2010-09-01
- Perl Mysql中文乱码 - 2009-09-09
- Ruby on Rails Intro - 2008-12-20
- Multi-line comments in perl code - 2011-07-05
订阅:
博文评论 (Atom)
没有评论:
发表评论