如何在源代码中查找所有注释？

有两种风格的评论，C风格和C ++风格，如何识别它们？

/* comments */ // comments

我可以随意使用任何方法和第3库。

为了可靠地查找Java源文件中的所有注释，我不会使用正则表达式，而是使用真正的词法分析器（也就是标记化器）。

Java的两个流行选择是：

JFlex： http ： //jflex.de
ANTLR： http ： //www.antlr.org

与流行的看法相反，ANTLR也可用于仅创建没有解析器的词法分析器。

这是一个快速的ANTLR演示。您需要在同一目录中包含以下文件：

ANTLR-3.2.jar
JavaCommentLexer.g（语法）
Main.java
Test.java（带有异国情调评论的有效（！）java源文件）

JavaCommentLexer.g

 lexer grammar JavaCommentLexer; options { filter=true; } SingleLineComment : FSlash FSlash ~('\r' | '\n')* ; MultiLineComment : FSlash Star .* Star FSlash ; StringLiteral : DQuote ( (EscapedDQuote)=> EscapedDQuote | (EscapedBSlash)=> EscapedBSlash | Octal | Unicode | ~('\\' | '"' | '\r' | '\n') )* DQuote {skip();} ; CharLiteral : SQuote ( (EscapedSQuote)=> EscapedSQuote | (EscapedBSlash)=> EscapedBSlash | Octal | Unicode | ~('\\' | '\'' | '\r' | '\n') ) SQuote {skip();} ; fragment EscapedDQuote : BSlash DQuote ; fragment EscapedSQuote : BSlash SQuote ; fragment EscapedBSlash : BSlash BSlash ; fragment FSlash : '/' | '\\' ('u002f' | 'u002F') ; fragment Star : '*' | '\\' ('u002a' | 'u002A') ; fragment BSlash : '\\' ('u005c' | 'u005C')? ; fragment DQuote : '"' | '\\u0022' ; fragment SQuote : '\'' | '\\u0027' ; fragment Unicode : '\\u' Hex Hex Hex Hex ; fragment Octal : '\\' ('0'..'3' Oct Oct | Oct Oct | Oct) ; fragment Hex : '0'..'9' | 'a'..'f' | 'A'..'F' ; fragment Oct : '0'..'7' ;

Main.java

 import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { JavaCommentLexer lexer = new JavaCommentLexer(new ANTLRFileStream("Test.java")); CommonTokenStream tokens = new CommonTokenStream(lexer); for(Object o : tokens.getTokens()) { CommonToken t = (CommonToken)o; if(t.getType() == JavaCommentLexer.SingleLineComment) { System.out.println("SingleLineComment :: " + t.getText().replace("\n", "\\n")); } if(t.getType() == JavaCommentLexer.MultiLineComment) { System.out.println("MultiLineComment :: " + t.getText().replace("\n", "\\n")); } } } }

Test.java

 \u002f\u002a <- multi line comment start multi line comment // not a single line comment \u002A/ public class Test { // single line "not a string" String s = "\u005C" \242 not // a comment \\\" \u002f \u005C\u005C \u0022; /* regular multi line comment */ char c = \u0027"'; // the " is not the start of a string char q1 = '\u005c''; // == '\'' char q2 = '\u005c\u0027'; // == '\'' char q3 = \u0027\u005c\u0027\u0027; // == '\'' char c4 = '\047'; String t = "/*"; \u002f\u002f another single line comment String u = "*/"; }

现在，要运行演示，请执行以下操作：

 bart@hades:~/Programming/ANTLR/Demos/JavaComment$ java -cp antlr-3.2.jar org.antlr.Tool JavaCommentLexer.g bart@hades:~/Programming/ANTLR/Demos/JavaComment$ javac -cp antlr-3.2.jar *.java bart@hades:~/Programming/ANTLR/Demos/JavaComment$ java -cp .:antlr-3.2.jar Main

你会看到以下内容被打印到控制台：

 MultiLineComment :: \u002f\u002a <- multi line comment start\nmulti\nline\ncomment // not a single line comment\n\u002A/ SingleLineComment :: // single line "not a string" SingleLineComment :: // a comment \\\" \u002f \u005C\u005C \u0022; MultiLineComment :: /*\n regular multi line comment\n */ SingleLineComment :: // the " is not the start of a string SingleLineComment :: // == '\'' SingleLineComment :: // == '\'' SingleLineComment :: // == '\'' SingleLineComment :: \u002f\u002f another single line comment

编辑

当然，您可以使用正则表达式创建一种词法分析器。以下演示不处理源文件中的Unicode文字，但是：

Test2.java

 /* <- multi line comment start multi line comment // not a single line comment */ public class Test2 { // single line "not a string" String s = "\" \242 not // a comment \\\" "; /* regular multi line comment */ char c = '"'; // the " is not the start of a string char q1 = '\''; // == '\'' char c4 = '\047'; String t = "/*"; // another single line comment String u = "*/"; }

Main2.java

 import java.util.*; import java.io.*; import java.util.regex.*; public class Main2 { private static String read(File file) throws IOException { StringBuilder b = new StringBuilder(); Scanner scan = new Scanner(file); while(scan.hasNextLine()) { String line = scan.nextLine(); b.append(line).append('\n'); } return b.toString(); } public static void main(String[] args) throws Exception { String contents = read(new File("Test2.java")); String slComment = "//[^\r\n]*"; String mlComment = "/\\*[\\s\\S]*?\\*/"; String strLit = "\"(?:\\\\.|[^\\\\\"\r\n])*\""; String chLit = "'(?:\\\\.|[^\\\\'\r\n])+'"; String any = "[\\s\\S]"; Pattern p = Pattern.compile( String.format("(%s)|(%s)|%s|%s|%s", slComment, mlComment, strLit, chLit, any) ); Matcher m = p.matcher(contents); while(m.find()) { String hit = m.group(); if(m.group(1) != null) { System.out.println("SingleLine :: " + hit.replace("\n", "\\n")); } if(m.group(2) != null) { System.out.println("MultiLine :: " + hit.replace("\n", "\\n")); } } } }

如果您运行Main2 ，以下内容将打印到控制台：

 MultiLine :: /* <- multi line comment start\nmulti\nline\ncomment // not a single line comment\n*/ SingleLine :: // single line "not a string" MultiLine :: /*\n regular multi line comment\n */ SingleLine :: // the " is not the start of a string SingleLine :: // == '\'' SingleLine :: // another single line comment

编辑：我一直在寻找，但这是真正的工作正则表达式：

 String regex = "((//[^\n\r]*)|(/\\*(.+?)\\*/))"; // New Regex List comments = new ArrayList(); Pattern p = Pattern.compile(regex, Pattern.DOTALL); Matcher m = p.matcher(code); // code is the C-Style code, in which you want to serach while (m.find()) { System.out.println(m.group(1)); comments.add(m.group(1)); }

有了这个输入：

 import Blah; //Comment one// line(); /* Blah */ line2(); // something weird /* Multiline another line for the comment */

它生成此输出：

 //Comment one// /* Blah */ line2(); // something weird /* Multiline another line for the comment */

请注意，输出的最后三行是一次打印。

你试过正则表达式吗？这是一个很好的总结Java示例。 ~~它可能需要一些调整~~但是仅使用正则表达式对于更复杂的结构（嵌套注释，字符串中的“ 注释 ”）是不够的，但它是一个很好的开始。

如何在源代码中查找所有注释？

JavaCommentLexer.g

Main.java

Test.java

编辑

Test2.java

Main2.java

从同一个表读取的两个线程：如何使两个线程不从TASKS表中读取同一组数据

H2不在我的Spring Boot应用程序中创建/更新表。我的实体出了什么问题？

如何在现有的tomcat Web应用程序上实现SSO

如何在嵌入eclipse的maven资源库中安装jar？

Hadoop构建在Windows中失败：native.sln缺少zconf.h？

Swing：将事件委托给已转换父级的子组件

Selenium web驱动程序：无法滚动到视图中

当前屏幕上的JavaFX警报

使用servletfilter从发布的数据中删除表单参数

使用Java程序分发使用JavaDB创建的数据库

如何在源代码中查找所有注释？

JavaCommentLexer.g

Main.java

Test.java

编辑

Test2.java

Main2.java

从同一个表读取的两个线程：如何使两个线程不从TASKS表中读取同一组数据

H2不在我的Spring Boot应用程序中创建/更新表。 我的实体出了什么问题？

如何在现有的tomcat Web应用程序上实现SSO

如何在嵌入eclipse的maven资源库中安装jar？

Hadoop构建在Windows中失败：native.sln缺少zconf.h？

Swing：将事件委托给已转换父级的子组件

Selenium web驱动程序：无法滚动到视图中

当前屏幕上的JavaFX警报

使用servletfilter从发布的数据中删除表单参数

使用Java程序分发使用JavaDB创建的数据库

H2不在我的Spring Boot应用程序中创建/更新表。我的实体出了什么问题？