正则表达式匹配句子

如何匹配“Hello world”或“Hello World”forms的句子。 句子可能包含“ – / digit 0-9”。 任何信息对我都非常有帮助。 谢谢。

这个会做得很好。 我对句子的定义:句子以非空格开头,以句号,感叹号或问号(或字符串结尾)结束。 结尾标点后可能会有结束语。

[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)

 import java.util.regex.*; public class TEST { public static void main(String[] args) { String subjectString = "This is a sentence. " + "So is \"this\"! And is \"this?\" " + "This is 'stackoverflow.com!' " + "Hello World"; String[] sentences = null; Pattern re = Pattern.compile( "# Match a sentence ending in punctuation or EOS.\n" + "[^.!?\\s] # First char is non-punct, non-ws\n" + "[^.!?]* # Greedily consume up to punctuation.\n" + "(?: # Group for unrolling the loop.\n" + " [.!?] # (special) inner punctuation ok if\n" + " (?!['\"]?\\s|$) # not followed by ws or EOS.\n" + " [^.!?]* # Greedily consume up to punctuation.\n" + ")* # Zero or more (special normal*)\n" + "[.!?]? # Optional ending punctuation.\n" + "['\"]? # Optional closing quote.\n" + "(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS); Matcher reMatcher = re.matcher(subjectString); while (reMatcher.find()) { System.out.println(reMatcher.group()); } } } 

这是输出:

This is a sentence.
So is "this"!
And is "this?"
This is 'stackoverflow.com!'
Hello World

正确地匹配所有这些(最后一句没有结束标点符号),结果似乎并不那么容易!

如果用句子表示以标点符号结尾的东西,请尝试: (.*?)[.?!]

说明:

  • .*匹配任何字符串。 添加? 使它非贪婪匹配(匹配可能的最小字符串)
  • [.?!]匹配三个标点符号中的任何一个