如何使用Lucene的新分析InfixSuggester API实现自动建议？

我是Lucene的绿手，我想实现自动建议，就像谷歌一样，当我输入像’G’这样的字符时，它会给我一个列表，你可以尝试自己。

我在整个网上搜索过。没有人这样做，它给了我们一些新的工具包建议

但我需要一个例子告诉我该怎么做

有人可以帮忙吗？

我将为您提供一个非常完整的示例，向您展示如何使用AnalyzingInfixSuggester 。在这个例子中，我们假装我们是亚马逊，我们想要自动完成产品搜索字段。我们将利用Lucene建议系统的function来实现以下function：

排名结果：我们将首先推荐最受欢迎的配套产品。
受区域限制的结果：我们仅建议在客户所在国家/地区销售的产品。
产品照片：我们将产品照片URL存储在建议索引中，以便我们可以在搜索结果中显示它们，而无需进行额外的数据库查找。

首先，我将定义一个简单的类来保存Product.java中有关产品的信息：

 import java.util.Set; class Product implements java.io.Serializable { String name; String image; String[] regions; int numberSold; public Product(String name, String image, String[] regions, int numberSold) { this.name = name; this.image = image; this.regions = regions; this.numberSold = numberSold; } }

要使用AnalyzingInfixSuggester的build方法索引记录，您需要向其传递一个实现org.apache.lucene.search.suggest.InputIterator接口的对象。 InputIterator可以访问每条记录的密钥， 上下文 ， 有效负载和权重。

关键是您实际要搜索的文本并自动完成。在我们的示例中，它将是产品的名称。

上下文是一组可用于过滤记录的附加任意数据。在我们的示例中，上下文是我们将特定产品发送到的国家/地区的ISO代码集。

有效负载是您要存储在记录索引中的其他任意数据。在这个例子中，我们实际上将序列化每个Product实例并将结果字节存储为有效负载。然后，当我们稍后进行查找时，我们可以反序列化有效负载并访问产品实例中的信息，如图像URL。

权重用于订购建议结果; 首先返回重量较高的结果。我们将使用给定产品的销售数量作为其重量。

这是ProductIterator.java的内容：

 import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.ObjectOutputStream; import java.io.UnsupportedEncodingException; import java.util.Comparator; import java.util.HashSet; import java.util.Iterator; import java.util.Set; import org.apache.lucene.search.suggest.InputIterator; import org.apache.lucene.util.BytesRef; class ProductIterator implements InputIterator { private Iterator productIterator; private Product currentProduct; ProductIterator(Iterator productIterator) { this.productIterator = productIterator; } public boolean hasContexts() { return true; } public boolean hasPayloads() { return true; } public Comparator getComparator() { return null; } // This method needs to return the key for the record; this is the // text we'll be autocompleting against. public BytesRef next() { if (productIterator.hasNext()) { currentProduct = productIterator.next(); try { return new BytesRef(currentProduct.name.getBytes("UTF8")); } catch (UnsupportedEncodingException e) { throw new Error("Couldn't convert to UTF-8"); } } else { return null; } } // This method returns the payload for the record, which is // additional data that can be associated with a record and // returned when we do suggestion lookups. In this example the // payload is a serialized Java object representing our product. public BytesRef payload() { try { ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream out = new ObjectOutputStream(bos); out.writeObject(currentProduct); out.close(); return new BytesRef(bos.toByteArray()); } catch (IOException e) { throw new Error("Well that's unfortunate."); } } // This method returns the contexts for the record, which we can // use to restrict suggestions. In this example we use the // regions in which a product is sold. public Set contexts() { try { Set regions = new HashSet(); for (String region : currentProduct.regions) { regions.add(new BytesRef(region.getBytes("UTF8"))); } return regions; } catch (UnsupportedEncodingException e) { throw new Error("Couldn't convert to UTF-8"); } } // This method helps us order our suggestions. In this example we // use the number of products of this type that we've sold. public long weight() { return currentProduct.numberSold; } }

在我们的驱动程序中，我们将执行以下操作：

在RAM中创建索引目录。
创建一个StandardTokenizer 。
使用RAM目录和标记生成器创建AnalyzingInfixSuggester。
使用ProductIterator索引许多ProductIterator 。
打印一些示例查找的结果。

这是驱动程序，SuggestProducts.java：

 import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.ObjectInputStream; import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester; import org.apache.lucene.search.suggest.Lookup; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.Version; public class SuggestProducts { // Get suggestions given a prefix and a region. private static void lookup(AnalyzingInfixSuggester suggester, String name, String region) { try { List results; HashSet contexts = new HashSet(); contexts.add(new BytesRef(region.getBytes("UTF8"))); // Do the actual lookup. We ask for the top 2 results. results = suggester.lookup(name, contexts, 2, true, false); System.out.println("-- \"" + name + "\" (" + region + "):"); for (Lookup.LookupResult result : results) { System.out.println(result.key); Product p = getProduct(result); if (p != null) { System.out.println(" image: " + p.image); System.out.println(" # sold: " + p.numberSold); } } } catch (IOException e) { System.err.println("Error"); } } // Deserialize a Product from a LookupResult payload. private static Product getProduct(Lookup.LookupResult result) { try { BytesRef payload = result.payload; if (payload != null) { ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes); ObjectInputStream in = new ObjectInputStream(bis); Product p = (Product) in.readObject(); return p; } else { return null; } } catch (IOException|ClassNotFoundException e) { throw new Error("Could not decode payload :("); } } public static void main(String[] args) { try { RAMDirectory index_dir = new RAMDirectory(); StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester( Version.LUCENE_48, index_dir, analyzer); // Create our list of products. ArrayList products = new ArrayList(); products.add( new Product( "Electric Guitar", "http://sofzh.miximages.com/java/electric-guitar.jpg", new String[]{"US", "CA"}, 100)); products.add( new Product( "Electric Train", "http://sofzh.miximages.com/java/train.jpg", new String[]{"US", "CA"}, 100)); products.add( new Product( "Acoustic Guitar", "http://sofzh.miximages.com/java/acoustic-guitar.jpg", new String[]{"US", "ZA"}, 80)); products.add( new Product( "Guarana Soda", "http://sofzh.miximages.com/java/soda.jpg", new String[]{"ZA", "IE"}, 130)); // Index the products with the suggester. suggester.build(new ProductIterator(products.iterator())); // Do some example lookups. lookup(suggester, "Gu", "US"); lookup(suggester, "Gu", "ZA"); lookup(suggester, "Gui", "CA"); lookup(suggester, "Electric guit", "US"); } catch (IOException e) { System.err.println("Error!"); } } }

这是驱动程序的输出：

 -- "Gu" (US): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 Acoustic Guitar image: http://images.example/acoustic-guitar.jpg # sold: 80 -- "Gu" (ZA): Guarana Soda image: http://images.example/soda.jpg # sold: 130 Acoustic Guitar image: http://images.example/acoustic-guitar.jpg # sold: 80 -- "Gui" (CA): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 -- "Electric guit" (US): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100

附录

有一种方法可以避免编写一个您可能更容易找到的完整InputIterator 。您可以编写一个存根InputIterator ，它从next payload和contexts方法返回null 。将它的实例传递给AnalyzingInfixSuggester的build方法：

 suggester.build(new ProductIterator(new ArrayList().iterator()));

然后，对于要索引的每个项目，请调用AnalyzingInfixSuggester add方法：

 suggester.add(text, contexts, weight, payload)

在为所有内容编制索引之后，请调用refresh ：

 suggester.refresh();

如果您正在索引大量数据，则可以使用多个线程的此方法显着加快索引编制：调用build ，然后使用多个线程add项目，最后调用refresh 。

[编辑2015-04-23以演示来自LookupResult有效负载的反序列化信息。]

如何使用Lucene的新分析InfixSuggester API实现自动建议？

附录

在eclipse中启用Java EE的完整文档

IntelliJ自动完成替换后续单词

Eclipse自动完成function无法正常工作

如何为用java编写的文本编辑器实现自动完成？

应用程序冻结自动完成textChanged

Java自动完成TextField（Ajax样式）

如何在拖动框架时设置JWindow在文本字段下方的位置？

使用下拉列表在Java中创建自动完成文本框

如何在JTextArea swing中实现autosugesion

在没有String的wicket中使用AutoCompleteTextField作为generics类型