在java中基于空格分割一个字符串,用双引号和单引号转义那些空格以及前面带\
我对正则表达式完全不熟悉。 我正在尝试组合一个表达式,该表达式将使用未被单引号或双引号括起的所有空格分割示例字符串,并且前面没有’\’
例如:-
He is a "man of his" words\ always
必须拆分为
He is a "man of his" words\ always
我明白
List matchList = new ArrayList(); Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'"); Matcher regexMatcher = regex.matcher(StringToBeMatched); while (regexMatcher.find()) { matchList.add(regexMatcher.group()); }
l使用未被单引号或双引号括起的所有空格拆分示例字符串
如果前面有一个\ _,那么如何合并忽略空白区域的第三个条件?
你可以使用这个正则表达式:
((["']).*?\2|(?:[^\\ ]+\\\s+)+[^\\ ]+|\S+)
RegEx演示
在Java中:
Pattern regex = Pattern.compile ( "(([\"']).*?\2|(?:[^\\\\ ]+\\\\\s+)+[^\\\\ ]+|\\S+)" );
说明:
这个正则表达式适用于交替:
- 首先匹配
([\"']).*?\\2
以匹配任何引用的(双或单)字符串。 - 然后匹配
(?:[^\\ ]+\\\s+)+[^\\ ]+
以匹配任何带有转义空格的字符串。 - 最后使用
\S+
匹配任何没有空格的单词。
Anubhava的解决方案很好……我特别喜欢他使用S + 。 我的解决方案在分组中类似,除了捕获第三个备用组中的开始和结束单词边界…
正则表达式
(?i)((?:(['|"]).+\2)|(?:\w+\\\s\w+)+|\b(?=\w)\w+\b(?!\w))
对于Java
(?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))
例
String subject = "He is a \"man of his\" words\\ always 'and forever'"; Pattern pattern = Pattern.compile( "(?i)((?:(['|\"]).+\\2)|(?:\\w+\\\\\\s\\w+)+|\\b(?=\\w)\\w+\\b(?!\\w))" ); Matcher matcher = pattern.matcher( subject ); while( matcher.find() ) { System.out.println( matcher.group(0).replaceAll( subject, "$1" )); }
结果
He is a "man of his" words\ always 'and forever'
详细说明
"(?i)" + // Match the remainder of the regex with the options: case insensitive (i) "(" + // Match the regular expression below and capture its match into backreference number 1 // Match either the regular expression below (attempting the next alternative only if this one fails) "(?:" + // Match the regular expression below "(" + // Match the regular expression below and capture its match into backreference number 2 "['|\"]" + // Match a single character present in the list “'|"” ")" + "." + // Match any single character that is not a line break character "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\\2" + // Match the same text as most recently matched by capturing group number 2 ")" + "|" + // Or match regular expression number 2 below (attempting the next alternative only if this one fails) "(?:" + // Match the regular expression below "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\\\\" + // Match the character “\” literally "\\s" + // Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) ")+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "|" + // Or match regular expression number 3 below (the entire group fails if this one fails to match) "\\b" + // Assert position at a word boundary "(?=" + // Assert that the regex below can be matched, starting at this position (positive lookahead) "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) ")" + "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) "+" + // Between one and unlimited times, as many times as possible, giving back as needed (greedy) "\\b" + // Assert position at a word boundary "(?!" + // Assert that it is impossible to match the regex below starting at this position (negative lookahead) "\\w" + // Match a single character that is a “word character” (letters, digits, etc.) ")" + ")"
表示\
和whitespace
则表达式看起来像\\\s
,其中\\
表示\
, \s
表示任何空格。 表示这种正则表达式的字符串需要写成"\\\\\\s"
因为我们需要在字符串中通过在它之前添加另一个\
来转义\
。
所以现在我们可能希望找到我们的模式
-
"..."
– >"[^"]*"
- 或
'...'
– >'[^']*'
-
或者是非空格(
\S
)的字符,但也包括那些在它们之前具有\
空格(\\\s)
。 这个有点棘手,因为\S
还可以消耗\
放置在空间之前,这会阻止\\\s
匹配,这就是我们想要正则表达式引擎的原因- 首先搜索
\\\s
- 后来
\S
因此,而不是像
(\S|\\\s)+
这样的东西,我们需要将这部分正则表达式写为(\\\s|\S)+
(因为正则表达式引擎试图测试和匹配由左边的OR
|
分隔的条件向右 – 例如在正则表达式的情况下,如a|ab
ab
将永远不会匹配,因为将由正则表达式的左部分消耗掉) - 首先搜索
所以你的模式看起来像
Pattern regex = Pattern.compile("\"[^\"]*\"|'[^']*'|(\\\\\\s|\\S)+");