用于检测类/接口/ etc声明的Java Regular Expression
我正在尝试创建一个检测新类的正则表达式,例如:
public interface IGame {
要么
private class Game {
这是我到目前为止,但它没有检测到:
(line.matches("(public|protected|private|static|\\s)"+"(class|interface|\\s)"+"(\\w+)"))
谁能给我一些指示吗?
像下面这样更改你的正则表达式以匹配两种类型的字符串格式。
line.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{");
例:
String s1 = "public interface IGame {"; String s2 = "private class Game {"; System.out.println(s1.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{")); System.out.println(s2.matches("(?:public|protected|private|static)\\s+(?:class|interface)\\s+\\w+\\s*\\{"));
输出:
true true
此正则表达式是Java类和接口声明的不完整规范。 但是,它可以匹配如下声明:
abstract class X>extends java.util.ArrayList
implements java.util.Queue ,Serializable{}
public@Deprecated interface K
extends Comparable {}
所以它应该足以满足大多数目的。
请阅读代码以在最后生成正则表达式,以确切知道跳过的内容。 总之, ReferenceType
, Annotation
没有完全合并到这个正则表达式中,因为它们需要递归正则表达式才能正确解析。
Java正则表达式在其所有g(l)ory细节中匹配Java类声明:
(?:((?:public|protected|private|abstract|static|final|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)(?:[ \t\f\r\n]*+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+)*+[ \t\f\r\n]*+>)[ \t\f\r\n]*+)?+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)[ \t\f\r\n]*+)?+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)*+[ \t\f\r\n]*+))?+[{]
并且对于Java接口声明:
(?:((?:public|protected|private|abstract|static|strictfp|@[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)(?:[ \t\f\r\n]*+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]++extends[ \t\f\r\n]++(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+|\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+))?+)*+[ \t\f\r\n]*+>)[ \t\f\r\n]*+)?+(?:(?)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+(?:[ \t\f\r\n]*+[.][ \t\f\r\n]*+\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+(?:[ \t\f\r\n]*+<[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+)(?:[ \t\f\r\n]*+,[ \t\f\r\n]*+(?:\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+|[?][ \t\f\r\n]*+(?:(?:extends|super)[ \t\f\r\n]++\p{javaJavaIdentifierStart}\p{javaJavaIdentifierPart}*+)?+))*+[ \t\f\r\n]*+>)?+)*+)*+)[ \t\f\r\n]*+)?+[{]
它是通过逐块构建图案而生成的。 我为Java SE 7引用了Java语言规范,用于NormalClassDeclaration
和NormalInterfaceDeclaration
的语法。
以下是生成上述怪物的代码,包括对跳过哪种模式的注释:
// All rules never end with WhiteSpace. This allows us to check for error easier. final static String Identifier = "\\p{javaJavaIdentifierStart}\\p{javaJavaIdentifierPart}*+"; // Heuristic to exclude any position where adding space would split an Identifier or a keyword apart final static String TokenBoundary = "(?:(?"; final static String TypeName = Identifier + "(?:" + WhiteSpaceOpt + "[.]" + WhiteSpaceOpt + Identifier + ")*+"; // Expanded definition of ClassOrInterfaceType = ClassType = InterfaceType // ClassOrInterfaceType -> TypeName TypeArguments // ClassOrInterfaceType -> ClassOrInterfaceType . Identifier TypeArguments final static String ClassType = TypeName + "(?:" + WhiteSpaceOpt + TypeArguments + ")?+" + "(?:" + WhiteSpaceOpt + "[.]" + WhiteSpaceOpt + Identifier + "(?:" + WhiteSpaceOpt + TypeArguments + ")?+" + ")*+"; // Definition of ClassType and InterfaceType are identical final static String InterfaceType = ClassType; final static String ClassOrInterfaceType = ClassType; final static String TypeBound = "extends" + WhiteSpaceCom + "(?:" + ClassOrInterfaceType + "|" + TypeVariable + ")"; final static String TypeParameter = TypeVariable + "(?:" + WhiteSpaceCom + TypeBound + ")?+"; final static String TypeParameterList = TypeParameter + "(?:" + WhiteSpaceOpt + "," + WhiteSpaceOpt + TypeParameter + ")*+"; final static String TypeParameters = "<" + WhiteSpaceOpt + TypeParameterList + WhiteSpaceOpt + ">"; final static String Super = "extends" + WhiteSpaceCom + ClassType; final static String InterfaceTypeList = InterfaceType + "(?:" + WhiteSpaceOpt + "," + WhiteSpaceOpt + InterfaceType + ")*+"; final static String Interfaces = "implements" + WhiteSpaceCom + InterfaceTypeList; final static String NormalClassDeclaration = // Annotation in its fullest form can end in ), so WhiteSpaceOpt is used here for pedantic // It can be changed to WhiteSpaceCom to save the TokenBoundary check, since current definition always end with Identifier character "(?:" + "(" + ClassModifiers + ")" + WhiteSpaceOpt + ")?+" + TokenBoundary + "class" + WhiteSpaceCom + // WhiteSpaceOpt is used here, since TypeParameters starts with < and no whitespace is needed to delimit "(" + Identifier + ")" + WhiteSpaceOpt + "(?:" + "(" + TypeParameters + ")" + WhiteSpaceOpt + ")?+" + // As the result, we need to check for boundary before "extends" and "implements" TokenBoundary + "(?:" + "(" + Super + ")"+ WhiteSpaceOpt + ")?+" + TokenBoundary + "(?:" + "(" + Interfaces + WhiteSpaceOpt + ")" + ")?+[{]"; // ClassBody is skipped, and only opening bracket { is matched final static String InterfaceModifier = "(?:public|protected|private|abstract|static|strictfp|" + Annotation + ")"; // Same as ClassModifiers, except that "final" is no longer a valid modifier final static String InterfaceModifiers = InterfaceModifier + "(?:" + WhiteSpaceOpt + TokenBoundary + InterfaceModifier + ")*+"; final static String ExtendsInterfaces = "extends" + WhiteSpaceCom + InterfaceTypeList; final static String NormalInterfaceDeclaration = "(?:" + "(" + InterfaceModifiers + ")" + WhiteSpaceOpt + ")?+" + TokenBoundary + "interface" + WhiteSpaceCom + // WhiteSpaceOpt is used here, since TypeParameters starts with < and no whitespace is needed to delimit "(" + Identifier + ")" + WhiteSpaceOpt + "(?:" + "(" + TypeParameters + ")" + WhiteSpaceOpt + ")?+" + // As the result, we need to check for boundary before "extends" here TokenBoundary + "(?:" + "(" + ExtendsInterfaces + ")" + WhiteSpaceOpt + ")?+[{]";
我很清楚CamelCase以及一些常数因复数而不同的事实是不合适的,但它提供了更好的映射到JLS中定义的语法。
这是关于ideone的演示。