正则表达式替换字符串中的所有\ n，但没有标记内的那些

我需要帮助来替换所有\ n（新行）caracters
在一个字符串中，但不是[code] [/ code]标签内的那些\ n。我的大脑正在燃烧，我无法用自己解决这个问题:(

例：

test test test test test test test test [code]some test code [/code] more text

应该：

 test test test
 test test test
 test
 test
 
 [code]some test code [/code]
 
 more text

谢谢你的时间。最好的祝福。

我会建议一个（简单的）解析器，而不是正则表达式。像这样的东西（坏伪代码）：

 stack elementStack; foreach(char in string) { if(string-from-char == "[code]") { elementStack.push("code"); string-from-char = ""; } if(string-from-char == "[/code]") { elementStack.popTo("code"); string-from-char = ""; } if(char == "\n" && !elementStack.contains("code")) { char = "
\n"; } }

你已经标记了正则表达式的问题，但这可能不是这项工作的最佳工具。

您可能更好地使用基本的编译器构建技术（即lexer提供简单的状态机解析器）。

你的词法分析器会识别五个标记:(“[code]”，“\ n”，“[/ code]”，EOF，：所有其他字符串:)和你的状态机看起来像：

状态令牌动作
 ------------------------
 begin：none： - > out
 out [code] OUTPUT（token）， - > in
 out \ n OUTPUT（break），OUTPUT（令牌）
 out * OUTPUT（令牌）
在[/ code] OUTPUT（令牌）， - > out
 in * OUTPUT（令牌）
 * EOF  - >结束

编辑：我看到其他海报讨论嵌套块的可能需要。这个状态机不会处理它。对于嵌套块，使用递归的正确解析器（不是那么简单但仍然足够容易和可扩展）。

编辑：Axeman指出，这种设计不允许在代码中使用“[/ code]”。可以使用逃逸机制来击败它。像添加’\’到你的标记并添加：

状态令牌动作
 ------------------------
在\  - > esc-in
 esc-in * OUTPUT（令牌）， - > in
 out \  - > esc-out
 esc-out * OUTPUT（令牌）， - > out

到国家机器。

适用于机器生成的词法分析器和解析器的通常参数适用。

这似乎是这样做的：

 private final static String PATTERN = "\\*+"; public static void main(String args[]) { Pattern p = Pattern.compile("(.*?)(\\[/?code\\])", Pattern.DOTALL); String s = "test 1 ** [code]test 2**blah[/code] test3 ** blah [code] test * 4 [code] test 5 * [/code] * test 6[/code] asdf **"; Matcher m = p.matcher(s); StringBuffer sb = new StringBuffer(); // note: it has to be a StringBuffer not a StringBuilder because of the Pattern API int codeDepth = 0; while (m.find()) { if (codeDepth == 0) { m.appendReplacement(sb, m.group(1).replaceAll(PATTERN, "")); } else { m.appendReplacement(sb, m.group(1)); } if (m.group(2).equals("[code]")) { codeDepth++; } else { codeDepth--; } sb.append(m.group(2)); } if (codeDepth == 0) { StringBuffer sb2 = new StringBuffer(); m.appendTail(sb2); sb.append(sb2.toString().replaceAll(PATTERN, "")); } else { m.appendTail(sb); } System.out.printf("Original: %s%n", s); System.out.printf("Processed: %s%n", sb); }

它不是一个简单的正则表达式，但我不认为你可以用一个简单的正则表达式做你想要的。不处理嵌套元素等等。

正如其他海报所提到的，正则表达式不是最好的工具，因为它们几乎普遍被实现为贪婪算法。这意味着即使您尝试使用以下内容匹配代码块：

 (\[code\].*\[/code\])

然后表达式将匹配从第一个[code]标签到最后一个[/code]标签的所有内容，这显然不是您想要的。虽然有办法解决这个问题，但由此产生的正则表达式通常很脆弱，不直观，而且非常丑陋。类似下面的python代码会更好。

 output = [] def add_brs(str): return str.replace('\n','
\n') # the first block will *not* have a matching [/code] tag blocks = input.split('[code]') output.push(add_brs(blocks[0])) # for all the rest of the blocks, only add 
 tags to # the segment after the [/code] segment for block in blocks[1:]: if len(block.split('[/code]'))!=1: raise ParseException('Too many or few [/code] tags') else: # the segment in the code block is pre, everything # after is post pre, post = block.split('[/code]') output.push(pre) output.push(add_brs(post)) # finally join all the processed segments together output = "".join(output)

请注意，上面的代码没有经过测试，只是粗略了解您需要做什么。

为了做到这一点，你真的需要做三次通过：

找到[code]块并用唯一的令牌+索引替换它们（保存原始块），例如“foo [code] abc [/ code] bar [code] efg [/ code]”变成“foo TOKEN-1 barTOKEN -2″
做换行更换。
扫描转义令牌并恢复原始块。

代码看起来像*：

 Matcher m = escapePattern.matcher(input); while(m.find()) { String key = nextKey(); escaped.put(key,m.group()); m.appendReplacement(output1,"TOKEN-"+key); } m.appendTail(output1); Matcher m2 = newlinePatten.matcher(output1); while(m2.find()) { m.appendReplacement(output2,newlineReplacement); } m2.appendTail(output2); Matcher m3 = Pattern.compile("TOKEN-(\\d+)").matcher(output2); while(m3.find()) { m.appendReplacement(finalOutput,escaped.get(m3.group(1))); } m.appendTail(finalOutput);

这是快速而肮脏的方式。有更有效的方法（其他人提到了解析器/词法分析器），但除非你处理数百万行并且你的代码是CPU绑定的（而不是I / O绑定，就像大多数webapps一样）并且你已经通过分析器确认了这是瓶颈，他们可能不值得。

*我没有运行它，这完全来自内存。只需检查API ，您就可以解决它。

这很难，因为如果正则表达式善于找到某些东西，他们就不会很好地匹配除了某些东西之外的所有内容……所以你必须使用一个循环，我怀疑你可以一次性做到这一点。

搜索之后，我发现了一些关于cletus解决方案的东西，除了我认为代码块不能嵌套，导致更简单的代码：选择适合您需求的代码。

 import java.util.regex.*; class Test { static final String testString = "foo\nbar\n[code]\nprint'';\nprint{'c'};\n[/code]\nbar\nfoo"; static final String replaceString = "
\n"; public static void main(String args[]) { Pattern p = Pattern.compile("(.+?)(\\[code\\].*?\\[/code\\])?", Pattern.DOTALL); Matcher m = p.matcher(testString); StringBuilder result = new StringBuilder(); while (m.find()) { result.append(m.group(1).replaceAll("\\n", replaceString)); if (m.group(2) != null) { result.append(m.group(2)); } } System.out.println(result.toString()); } }

粗略快速测试，你需要更多（null，空字符串，无代码标签，多个等）。

正则表达式替换字符串中的所有\ n，但没有标记内的那些

如何为JIRA API生成jwt标记

在Java中锁定文件的存在

在Java 8中，转换Optional.empty中空String的Optional

如何组织Java属性条目进行国际化？

从JComboBox获取输入值

如何调试Spring AOP

如何在程序结束前让方法在后台持续运行？

单个还是多个Maven pom.xml配置文件？

Java数组是同构的是什么意思，但ArrayLists不是？

Java – 为什么char会被隐式地转换为byte（和short）原语，而不应该？

正则表达式替换字符串中的所有\ n，但没有 标记内的那些

如何为JIRA API生成jwt标记

在Java中锁定文件的存在

在Java 8中，转换Optional.empty中空String的Optional

如何组织Java属性条目进行国际化？

从JComboBox获取输入值

如何调试Spring AOP

如何在程序结束前让方法在后台持续运行？

单个还是多个Maven pom.xml配置文件？

Java数组是同构的是什么意思，但ArrayLists不是？

Java – 为什么char会被隐式地转换为byte（和short）原语，而不应该？

正则表达式替换字符串中的所有\ n，但没有标记内的那些