在值列表中使用正则表达式获取括号内容

我正在尝试寻找一个正则表达式(Coldfusion或Java),它可以让我获得每个(param \ d +)的括号之间的内容。 我已经尝试了几十种不同类型的正则表达式,而我得到的最接近的就是这一种:

\(param \d+\) = \[(type='[^']*', class='[^']*', value='(?:[^']|'')*', sqltype='[^']*')\] 

如果从CF返回的字符串从value参数中转出单引号,那么这将是完美的。 但事实并非如此,它失败了。 像这样走负面前瞻的路线:

 \[(type='[^']*', class='[^']*', value='(?:(?!', sqltype).)*', sqltype='[^']*')\] 

很棒,除非出于一些无法解释的原因,有一段代码完全具有, sqltype的值。 我发现很难相信我不能简单地告诉正则表达式挖出它找到的每个开放和闭合括号的内容但是再一次,我不知道足够的正则表达式知道它的极限。

这是我正在尝试解析的示例字符串:

 (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar'] 

对于好奇,这是Copyable Coldfusion SQLexception的子问题。

编辑

这是我尝试在CF9.1中实现@Mena的答案。 可悲的是,它没有完成处理字符串。 我不得不将\\替换为\只是为了让它首先运行,但我的实现可能仍然有问题。

这是给定的字符串(管道只是表示边界):

 | (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly], really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype ', sqltype='cf_sql_varchar'] | 

这是我的实现:

     
()
|__ -->

这是印刷的:

 Start param 1 ( type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer' ) |__ type --> IN |__ class --> java.lang.Integer |__ value --> 47 param 2 ( type='IN', class='java.lang.String', value='asf , O'Reilly ) |__ type --> IN |__ class --> java.lang.String End 

这是一个适用于您的示例输入的Java正则表达式模式。

 (?x) # lookbehind to check for start of string or previous param # java lookbehinds must have max length, so limits sqltype (?<=^|sqltype='cf_sql_[az]{1,16}']\ ,\ ) # capture the full string for replacing in the orig sql # and just the position to verify against the match position (\(param\ (\d+)\)) \ =\ \[ # type and class wont contain quotes type='([^']++)' ,\ class='([^']++)' # match any non-quote, then lazily keep going ,\ value='([^']++.*?)' # sqltype is always alphanumeric ,\ sqltype='cf_sql_[az]+' \] # lookahead to check for end of string or next param (?=$|\ ,\ \(param\ \d+\)\ =\ \[) 

(?x)标志用于注释模式,它忽略未转义的空白以及散列和行尾之间。)

这是用CFML实现的模式(在CF9,0,1,274733上测试)。 它使用cfRegex (一个可以更容易地在CFML中使用Java正则表达式的库)来获取该模式的结果,然后进行一些检查以确保找到预期的参数数量。

  (param 1) = [type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , (param 2) = [type='IN', class='java.lang.String', value='asf , O'Reilly, really?', sqltype='cf_sql_varchar'] , (param 3) = [type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , []can break it ', sqltype= ', sqltype='cf_sql_varchar']    (?x) # lookbehind to check for start or previous param # java lookbehinds must have max length, so limits sqltype (?<=^|sqltype='cf_sql_[az]{1,16}']\ ,\ ) # capture the full string for replacing in the orig sql # and just the position to verify against the match position (\(param\ (\d+)\)) \ =\ \[ # type and class wont contain quotes type='([^']++)' ,\ class='([^']++)' # match any non-quote, then lazily keep going if needed ,\ value='([^']++.*?)' # sqltype is always alphanumeric ,\ sqltype='cf_sql_[az]+' \] # lookahead to check for end or next param (?=$|\ ,\ \(param\ \d+\)\ =\ \[)                Something went wrong!  All seems fine   

如果您将整个参数嵌入另一个参数的值中,这实际上会起作用。 如果您尝试两个(或更多),它将失败,但至少检查应检测到此故障。

这是转储输出应该是什么样子:

转储输出

希望这里的一切都有意义 - 让我知道是否有任何问题。

我可能会使用专用的解析器,但这里有一个关于如何使用两个Pattern和嵌套循环的示例:

 // the input String String input = "(param 1) = " + "[type='IN', class='java.lang.Integer', value='47', sqltype='cf_sql_integer'] , " + "(param 2) = " + "[type='IN', class='java.lang.String', value='asf , O'Reilly, really?', " + "sqltype='cf_sql_varchar'] , " + "(param 3) = " + "[type='IN', class='java.lang.String', value='Th[is]is Ev'ery'thing That , " "[]can break it ', sqltype= ', sqltype='cf_sql_varchar']"; // the Pattern defining the round-bracket expression and the following // square-bracket list. Both values within the brackets are grouped for back-reference // note that what prevents the 3rd case from breaking is that the closing square bracket // is expected to be either followed by optional space + comma, or end of input Pattern outer = Pattern.compile("\\((.+?)\\)\\s?\\=\\s?\\[(.+?)\\](\\s?,|$)"); // the Pattern defining the key-value pairs within the square-bracket groups // note that both key and value are grouped for back-reference Pattern inner = Pattern.compile("(.+?)\\s?\\=\\s?'(.+?)'\\s?,\\s?"); Matcher outerMatcher = outer.matcher(input); // iterating over the outer Pattern (type x) = [myKey = myValue, ad lib.], or end of input while (outerMatcher.find()) { System.out.println(outerMatcher.group(1)); Matcher innerMatcher = inner.matcher(outerMatcher.group(2)); // iterating over the inner Pattern myKey = myValue while (innerMatcher.find()) { System.out.println("\t" + innerMatcher.group(1) + " --> " + innerMatcher.group(2)); } } 

输出:

 param 1 type --> IN class --> java.lang.Integer value --> 47 param 2 type --> IN class --> java.lang.String value --> asf , O'Reilly, really? param 3 type --> IN class --> java.lang.String value --> Th[is]is Ev'ery'thing That , []can break it