如何使用正则表达式有效地向后搜索?

我正在搜索带有正则表达式的字符串数组,如下所示:

for (int j = line; j < lines.length; j++) { if (lines[j] == null || lines[j].isEmpty()) { continue; } matcher = pattern.matcher(lines[j]); if (matcher.find(offset)) { offset = matcher.end(); line = j; System.out.println("found \""+matcher.group()+"\" at line "+line+" ["+matcher.start()+","+offset+"]"); return true; } offset = 0; } return false; 

请注意,在上面的实现中,我保存了连续搜索的lineoffset

无论如何,现在我想从[line,offset] 向后搜索

我的问题:有没有办法有效地向后搜索正则表达式? 如果没有,还有什么可以替代?

澄清: 向后我的意思是找到前一场比赛。
例如,假设我正在搜索“dana”

 "dana nama? dana kama! lama dana kama?" 

并进入第二场比赛。 如果我再次执行matcher.find() ,我将向前搜索并获得第3场比赛。 但我想向后搜索并进入第一场比赛。
上面的代码应输出如下内容:

 found "dana" at line 0 [0,3] // fwd found "dana" at line 0 [11,14] // fwd found "dana" at line 0 [0,3] // bwd 

Java的正则表达式引擎无法向后搜索。 事实上,我所知道的唯一可以做到这一点的正则表达式引擎是.NET中的那个。

而不是向后搜索,迭代循环中的所有匹配(向前搜索)。 如果比赛在您想要的位置之前,请记住它。 如果匹配位于您想要的位置之后,请退出循环。 在伪代码中(我的Java有点生疏):

 storedmatch = "" while matcher.find { if matcher.end < offset { storedmatch = matcher.group() } else { return storedmatch } } 

下面的课程前后搜索(当然)。

我在一个应用程序中使用此类,用户可以在长文本中搜索字符串(如Web浏览器中的搜索function)。 所以它经过测试,适用于实际用例。

它采用的方法类似于Jan Goyvaerts所描述的方法。 它在开始位置之前选择一个文本块并向前搜索,如果有则返回最后一个匹配。 如果没有匹配,如果在块之前选择一个新的文本块并以相同的方式搜索它。

像这样用它:

 Search s = new Search("Big long text here to be searched [...]"); s.setPattern("some regexp"); // search backwards or forward as many times as you like, // the class keeps track where the last match was MatchResult where = s.searchBackward(); where = s.searchBackward(); // next match where = s.searchBackward(); // next match //or search forward where = s.searchForward(); where = s.searchForward(); 

和class级:

 import java.util.regex.MatchResult; import java.util.regex.Matcher; import java.util.regex.Pattern; /* * Search regular expressions or simple text forward and backward in a CharSequence * * * To simulate the backward search (that Java class doesn't have) the input data * is divided into chunks and each chunk is searched from last to first until a * match is found (inter-chunk matches are returned from last to first too). * * The search can fail if the pattern/match you look for is longer than the chunk * size, but you can set the chunk size to a sensible size depending on the specific * application. * * Also, because the match could span between two adjacent chunks, the chunks are * partially overlapping. Again, this overlapping size should be set to a sensible * size. * * A typical application where the user search for some words in a document will * work perfectly fine with default values. The matches are expected to be between * 10-15 chars, so any chunk size and overlapping size bigger than this expected * length will be fine. * * */ public class Search { private int BACKWARD_BLOCK_SIZE = 200; private int BACKWARD_OVERLAPPING = 20; private Matcher myFwdMatcher; private Matcher myBkwMatcher; private String mySearchPattern; private int myCurrOffset; private boolean myRegexp; private CharSequence mySearchData; public Search(CharSequence searchData) { mySearchData = searchData; mySearchPattern = ""; myCurrOffset = 0; myRegexp = true; clear(); } public void clear() { myFwdMatcher = null; myBkwMatcher = null; } public String getPattern() { return mySearchPattern; } public void setPattern(String toSearch) { if ( !mySearchPattern.equals(toSearch) ) { mySearchPattern = toSearch; clear(); } } public CharSequence getText() { return mySearchData; } public void setText(CharSequence searchData) { mySearchData = searchData; clear(); } public void setSearchOffset(int startOffset) { if (myCurrOffset != startOffset) { myCurrOffset = startOffset; clear(); } } public boolean isRegexp() { return myRegexp; } public void setRegexp(boolean regexp) { if (myRegexp != regexp) { myRegexp = regexp; clear(); } } public MatchResult searchForward() { if (mySearchData != null) { boolean found; if (myFwdMatcher == null) { // if it's a new search, start from beginning String searchPattern = myRegexp ? mySearchPattern : Pattern.quote(mySearchPattern); myFwdMatcher = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE).matcher(mySearchData); try { found = myFwdMatcher.find(myCurrOffset); } catch (IndexOutOfBoundsException e) { found = false; } } else { // continue searching found = myFwdMatcher.hitEnd() ? false : myFwdMatcher.find(); } if (found) { MatchResult result = myFwdMatcher.toMatchResult(); return onMatchResult(result); } } return onMatchResult(null); } public MatchResult searchBackward() { if (mySearchData != null) { myFwdMatcher = null; if (myBkwMatcher == null) { // if it's a new search, create a new matcher String searchPattern = myRegexp ? mySearchPattern : Pattern.quote(mySearchPattern); myBkwMatcher = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE).matcher(mySearchData); } MatchResult result = null; boolean startOfInput = false; int start = myCurrOffset; int end = start; while (result == null && !startOfInput) { start -= BACKWARD_BLOCK_SIZE; if (start < 0) { start = 0; startOfInput = true; } try { myBkwMatcher.region(start, end); } catch (IndexOutOfBoundsException e) { break; } while ( myBkwMatcher.find() ) { result = myBkwMatcher.toMatchResult(); } end = start + BACKWARD_OVERLAPPING; // depending on the size of the pattern this could not be enough //but how can you know the size of a regexp match beforehand? } return onMatchResult(result); } return onMatchResult(null); } private MatchResult onMatchResult(MatchResult result) { if (result != null) { myCurrOffset = result.start(); } return result; } } 

如果你想在这里测试这个类是一个用法示例:

在此处输入图像描述

 import java.awt.*; import java.awt.event.*; import javax.swing.*; import javax.swing.event.*; import java.util.regex.MatchResult; import javax.swing.text.DefaultHighlighter; import javax.swing.text.BadLocationException; public class SearchTest extends JPanel implements ActionListener { protected JScrollPane scrollPane; protected JTextArea textArea; protected boolean docChanged = true; protected Search searcher; public SearchTest() { super(new BorderLayout()); searcher = new Search(""); JButton backButton = new JButton("Search backward"); JButton fwdButton = new JButton("Search forward"); JPanel buttonPanel = new JPanel(new BorderLayout()); buttonPanel.add(fwdButton, BorderLayout.EAST); buttonPanel.add(backButton, BorderLayout.WEST); textArea = new JTextArea("Big long text here to be searched...", 20, 40); textArea.setEditable(true); scrollPane = new JScrollPane(textArea); final JTextField textField = new JTextField(40); //Add Components to this panel. add(buttonPanel, BorderLayout.NORTH); add(scrollPane, BorderLayout.CENTER); add(textField, BorderLayout.SOUTH); //Add actions backButton.setActionCommand("back"); fwdButton.setActionCommand("fwd"); backButton.addActionListener(this); fwdButton.addActionListener(this); textField.addActionListener( new ActionListener() { public void actionPerformed(ActionEvent e) { final String pattern = textField.getText(); searcher.setPattern(pattern); } } ); textArea.getDocument().addDocumentListener( new DocumentListener() { public void insertUpdate(DocumentEvent e) { docChanged = true; } public void removeUpdate(DocumentEvent e) { docChanged = true; } public void changedUpdate(DocumentEvent e) { docChanged = true; } }); } public void actionPerformed(ActionEvent e) { if ( docChanged ) { final String newDocument = textArea.getText(); searcher.setText(newDocument); docChanged = false; } MatchResult where = null; if ("back".equals(e.getActionCommand())) { where = searcher.searchBackward(); } else if ("fwd".equals(e.getActionCommand())) { where = searcher.searchForward(); } textArea.getHighlighter().removeAllHighlights(); if (where != null) { final int start = where.start(); final int end = where.end(); // highligh result and scroll try { textArea.getHighlighter().addHighlight(start, end, new DefaultHighlighter.DefaultHighlightPainter(Color.yellow)); } catch (BadLocationException excp) {} textArea.scrollRectToVisible(new Rectangle(0, 0, scrollPane.getViewport().getWidth(), scrollPane.getViewport().getHeight())); SwingUtilities.invokeLater(new Runnable() { @Override public void run() { textArea.setCaretPosition(start); } }); } else if (where == null) { // no match, so let's wrap around if ("back".equals(e.getActionCommand())) { searcher.setSearchOffset( searcher.getText().length() -1 ); } else if ("fwd".equals(e.getActionCommand())) { searcher.setSearchOffset(0); } } } private static void createAndShowGUI() { //Create and set up the window. JFrame frame = new JFrame("SearchTest"); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); //Add contents to the window. frame.add(new SearchTest()); //Display the window. frame.pack(); frame.setVisible(true); } public static void main(String[] args) { //Schedule a job for the event dispatch thread: //creating and showing this application's GUI. javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { createAndShowGUI(); } }); } } 

我使用以下简单类在java中向后搜索

 public class ReverseMatcher { private final Matcher _matcher; private final Stack _results = new Stack<>(); public ReverseMatcher(Matcher matcher){ _matcher = matcher; } public boolean find(){ return find(_matcher.regionEnd()); } public boolean find(int start){ if (_results.size() > 0){ _results.pop(); return _results.size() > 0; } boolean res = false; while (_matcher.find()){ if (_matcher.end() > start) break; res = true; _results.push(_matcher.toMatchResult()); } return res; } public String group(int group){ return _results.peek().group(group); } public String group(){ return _results.peek().group(); } public int start(){ return _results.peek().start(); } public int end(){ return _results.peek().end(); } } 

使用:

 String srcString = "1 2 3 4 5 6 7 8 9"; String pattern = "\\b[0-9]*\\b"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(srcString); ReverseMatcher rm = new ReverseMatcher(m); while (rm.find()) System.out.print(rm.group() + " "); 

输出:9 8 7 6 5 4 3 2 1

要么

 while (rm.find(9)) System.out.print(rm.group() + " "); 

输出:5 4 3 2 1

搜索字符串是严格的正则表达式(完整,丰富的语法?)因为如果不是, for(int j = line; j >= 0 ; j--) ,反转该行,反转匹配并向前搜索;)

如果前一场比赛是你已经匹配过的东西,那么在向前搜索时创建一个匹配位置列表然后只是用它来跳回而不是向后搜索呢?