Java Regular Expression Matcher找不到所有可能的匹配项
我正在看TutorialsPoint的代码,从那以后一直困扰着我……看看这段代码:
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexMatches { public static void main( String args[] ){ // String to be scanned to find the pattern. String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d+)(.*)"; // Create a Pattern object Pattern r = Pattern.compile(pattern); // Now create matcher object. Matcher m = r.matcher(line); while(m.find( )) { System.out.println("Found value: " + m.group(1)); System.out.println("Found value: " + m.group(2)); System.out.println("Found value: " + m.group(3)); } } }
此代码成功打印:
Found value: This was placed for QT300 Found value: 0 Found value: ! OK?
但根据正则表达式"(.*)(\\d+)(.*)"
,为什么不返回其他可能的结果,例如:
Found value: This was placed for QT30 Found value: 00 Found value: ! OK?
要么
Found value: This was placed for QT Found value: 3000 Found value: ! OK?
如果这段代码不适合这样做,那么如何编写一个可以找到所有可能匹配的代码呢?
这是因为*
的贪婪和回溯 。
字符串:
This order was placed for QT3000! OK?
正则表达式:
(.*)(\\d+)(.*)
我们都知道.*
贪婪,尽可能匹配所有角色。 所以第一个.*
匹配最后一个字符的所有字符?
然后它按顺序回溯以提供匹配。 我们的正则表达式中的下一个模式是\d+
,因此它回溯到一个数字。 一旦找到一个数字, \d+
匹配该数字,因为这里满足条件( \d+
匹配一个或多个数字 )。 现在第一个(.*)
捕获This order was placed for QT300
,以下(\\d+)
捕获位于之前的数字0
!
符号。
现在下一个模式(.*)
捕获所有剩余的字符!
。 m.group(1)
指的是组索引1和m.group(2)
内存在的字符m.group(2)
指的是索引2,就像它继续进行一样。
请在此处查看演示。
获得所需的输出。
String line = "This order was placed for QT3000! OK?"; String pattern = "(.*)(\\d{2})(.*)"; // Create a Pattern object Pattern r = Pattern.compile(pattern); // Now create matcher object. Matcher m = r.matcher(line); while(m.find( )) { System.out.println("Found value: " + m.group(1)); System.out.println("Found value: " + m.group(2)); System.out.println("Found value: " + m.group(3)); }
输出:
Found value: This order was placed for QT30 Found value: 00 Found value: ! OK?
(.*)(\\d{2})
,按顺序回溯最多两位数以提供匹配。
将您的模式更改为此,
String pattern = "(.*?)(\\d+)(.*)";
为了获得输出,
Found value: This order was placed for QT Found value: 3000 Found value: ! OK?
?
*
迫使*
进行非贪婪的比赛。
使用额外的捕获组来获取单个程序的输出。
String line = "This order was placed for QT3000! OK?"; String pattern = "((.*?)(\\d{2}))(?:(\\d{2})(.*))"; Pattern r = Pattern.compile(pattern); Matcher m = r.matcher(line); while(m.find( )) { System.out.println("Found value: " + m.group(1)); System.out.println("Found value: " + m.group(4)); System.out.println("Found value: " + m.group(5)); System.out.println("Found value: " + m.group(2)); System.out.println("Found value: " + m.group(3) + m.group(4)); System.out.println("Found value: " + m.group(5)); }
输出:
Found value: This order was placed for QT30 Found value: 00 Found value: ! OK? Found value: This order was placed for QT Found value: 3000 Found value: ! OK?
(.*?)(\\d+)(.*)
把*
贪婪的量词非贪婪放*?
。
因为你的第一组(.*)
是贪婪的,它会捕获evrything并且只会留下一个0
来捕获。如果你让它非贪婪它会给你预期的结果。参见演示。