用于检测类/接口/ etc声明的Java Regular Expression

我正在尝试创建一个检测新类的正则表达式,例如:

public interface IGame { 

要么

 private class Game { 

这是我到目前为止,但它没有检测到:

 (line.matches("(public|protected|private|static|\\s)"+"(class|interface|\\s)"+"(\\w+)")) 

谁能给我一些指示吗?

像下面这样更改你的正则表达式以匹配两种类型的字符串格式。

 line.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{"); 

例:

 String s1 = "public interface IGame {"; String s2 = "private class Game {"; System.out.println(s1.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{")); System.out.println(s2.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{")); 

输出:

 true true 

此正则表达式是Java类和接口声明的不完整规范。 但是,它可以匹配如下声明:

abstract class X>extends java.util.ArrayListimplements java.util.Queue,Serializable{}

public@Deprecated interface K extends Comparable{}

所以它应该足以满足大多数目的。

请阅读代码以在最后生成正则表达式,以确切知道跳过的内容。 总之, ReferenceTypeAnnotation没有完全合并到这个正则表达式中,因为它们需要递归正则表达式才能正确解析。

Java正则表达式在其所有g(l)ory细节中匹配Java类声明:

(?:((?:public|protected|private|abstract|static|final|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)(?:[ \t\f\r\n]*+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+)*+[ \t\f\r\n]*+>)[ \t\f\r\n]*+)?+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)[ \t\f\r\n]*+)?+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)*+[ \t\f\r\n]*+))?+[{]

并且对于Java接口声明:

(?:((?:public|protected|private|abstract|static|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)(?:[ \t\f\r\n]*+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+)*+[ \t\f\r\n]*+>)[ \t\f\r\n]*+)?+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)*+)[ \t\f\r\n]*+)?+[{]

它是通过逐块构建图案而生成的。 我为Java SE 7引用了Java语言规范,用于NormalClassDeclarationNormalInterfaceDeclaration的语法。

以下是生成上述怪物的代码,包括对跳过哪种模式的注释:

 // All rules never end with WhiteSpace. This allows us to check for error easier. final static String Identifier = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*+"; // Heuristic to exclude any position where adding space would split an Identifier or a keyword apart final static String TokenBoundary = "(?:(?"; final static String TypeName = Identifier + "(?:" + WhiteSpaceOpt + "[.]" + WhiteSpaceOpt + Identifier + ")*+"; // Expanded definition of ClassOrInterfaceType = ClassType = InterfaceType // ClassOrInterfaceType -> TypeName TypeArguments // ClassOrInterfaceType -> ClassOrInterfaceType . Identifier TypeArguments final static String ClassType = TypeName + "(?:" + WhiteSpaceOpt + TypeArguments + ")?+" + "(?:" + WhiteSpaceOpt + "[.]" + WhiteSpaceOpt + Identifier + "(?:" + WhiteSpaceOpt + TypeArguments + ")?+" + ")*+"; // Definition of ClassType and InterfaceType are identical final static String InterfaceType = ClassType; final static String ClassOrInterfaceType = ClassType; final static String TypeBound = "extends" + WhiteSpaceCom + "(?:" + ClassOrInterfaceType + "|" + TypeVariable + ")"; final static String TypeParameter = TypeVariable + "(?:" + WhiteSpaceCom + TypeBound + ")?+"; final static String TypeParameterList = TypeParameter + "(?:" + WhiteSpaceOpt + "," + WhiteSpaceOpt + TypeParameter + ")*+"; final static String TypeParameters = "<" + WhiteSpaceOpt + TypeParameterList + WhiteSpaceOpt + ">"; final static String Super = "extends" + WhiteSpaceCom + ClassType; final static String InterfaceTypeList = InterfaceType + "(?:" + WhiteSpaceOpt + "," + WhiteSpaceOpt + InterfaceType + ")*+"; final static String Interfaces = "implements" + WhiteSpaceCom + InterfaceTypeList; final static String NormalClassDeclaration = // Annotation in its fullest form can end in ), so WhiteSpaceOpt is used here for pedantic // It can be changed to WhiteSpaceCom to save the TokenBoundary check, since current definition always end with Identifier character "(?:" + "(" + ClassModifiers + ")" + WhiteSpaceOpt + ")?+" + TokenBoundary + "class" + WhiteSpaceCom + // WhiteSpaceOpt is used here, since TypeParameters starts with < and no whitespace is needed to delimit "(" + Identifier + ")" + WhiteSpaceOpt + "(?:" + "(" + TypeParameters + ")" + WhiteSpaceOpt + ")?+" + // As the result, we need to check for boundary before "extends" and "implements" TokenBoundary + "(?:" + "(" + Super + ")"+ WhiteSpaceOpt + ")?+" + TokenBoundary + "(?:" + "(" + Interfaces + WhiteSpaceOpt + ")" + ")?+[{]"; // ClassBody is skipped, and only opening bracket { is matched final static String InterfaceModifier = "(?:public|protected|private|abstract|static|strictfp|" + Annotation + ")"; // Same as ClassModifiers, except that "final" is no longer a valid modifier final static String InterfaceModifiers = InterfaceModifier + "(?:" + WhiteSpaceOpt + TokenBoundary + InterfaceModifier + ")*+"; final static String ExtendsInterfaces = "extends" + WhiteSpaceCom + InterfaceTypeList; final static String NormalInterfaceDeclaration = "(?:" + "(" + InterfaceModifiers + ")" + WhiteSpaceOpt + ")?+" + TokenBoundary + "interface" + WhiteSpaceCom + // WhiteSpaceOpt is used here, since TypeParameters starts with < and no whitespace is needed to delimit "(" + Identifier + ")" + WhiteSpaceOpt + "(?:" + "(" + TypeParameters + ")" + WhiteSpaceOpt + ")?+" + // As the result, we need to check for boundary before "extends" here TokenBoundary + "(?:" + "(" + ExtendsInterfaces + ")" + WhiteSpaceOpt + ")?+[{]"; 

我很清楚CamelCase以及一些常数因复数而不同的事实是不合适的,但它提供了更好的映射到JLS中定义的语法。

这是关于ideone的演示。