Lucene通过URL搜索

我正在存储一个包含URL字段的Document：

Document doc = new Document(); doc.add(new Field("url", url, Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("html", CompressionTools.compressString(html), Field.Store.YES));

我希望能够通过其URL找到Document，但我得到0结果：

 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30) Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).parse(url); IndexSearcher searcher = new IndexSearcher(index, true); TopScoreDocCollector collector = TopScoreDocCollector.create(10, true); searcher.search(query, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; // Display results for (ScoreDoc hit : hits) { System.out.println("FOUND A MATCH"); } searcher.close();

我可以做些什么，以便我可以存储HTML文档并通过其URL找到它？

您可以将查询重写为这样的内容

 Query query = new QueryParser(LUCENE_VERSION, "url", analyzer).newTermQuery(new Term("url", url)).parse(url);

建议：

我建议你使用BooleanQuery，因为它提供了良好的性能，并在内部进行了优化。

 TermQuery tq= new TermQuery(new Term("url", url)); // BooleanClauses Enum SHOULD says Use this operator for clauses that should appear in the matching documents. BooleanQuery bq = new BooleanQuery().add(tq,BooleanClause.Occur.SHOULD); IndexSearcher searcher = new IndexSearcher(index, true); TopScoreDocCollector collector = TopScoreDocCollector.create(10, true); searcher.search(query, collector);

我看到你使用URL frield作为Not_Analysed进行索引，这是一个很好的搜索IMO，因为没有使用分析器，所以该值将被存储为单个术语。

现在，如果你的商业案例说明，我会给你一个URL，找到Lucene Index中的EXACT ，然后用不同的分析器（KeywordAnalyzer等）查看你的索引。

Lucene QueryParser将一些url字符解释为Query Parser Syntax的一部分。您可以使用TermQuery ，如下所示：

 TermQuery query = new TermQuery(new Term("url", url));

Lucene通过URL搜索

每个jar的最大文件数是多少？

创建一个ClassLoader以从字节数组加载JAR文件

给定Java InputStream，如何确定流中的当前偏移量？

执行mapreduce程序时ClassNotFoundException

App Engine Java Servlet无法连接到Cloud SQL

在运行时更新JAR

Hibernate：动态更新动态插入 – 性能效果

ORMLite中的多个组合OR条件

在Spring Boot应用程序中使用Rackspace时，Apache jclouds java.lang.NoSuchMethodError

Java – 创建新文件，如何使用方法指定目录？