如何使用Lucene的新分析InfixSuggester API实现自动建议?

我是Lucene的绿手,我想实现自动建议,就像谷歌一样,当我输入像’G’这样的字符时,它会给我一个列表,你可以尝试自己。

我在整个网上搜索过。 没有人这样做,它给了我们一些新的工具包建议

但我需要一个例子告诉我该怎么做

有人可以帮忙吗?

我将为您提供一个非常完整的示例,向您展示如何使用AnalyzingInfixSuggester 。 在这个例子中,我们假装我们是亚马逊,我们想要自动完成产品搜索字段。 我们将利用Lucene建议系统的function来实现以下function:

  1. 排名结果:我们将首先推荐最受欢迎的配套产品。
  2. 受区域限制的结果:我们仅建议在客户所在国家/地区销售的产品。
  3. 产品照片:我们将产品照片URL存储在建议索引中,以便我们可以在搜索结果中显示它们,而无需进行额外的数据库查找。

首先,我将定义一个简单的类来保存Product.java中有关产品的信息:

 import java.util.Set; class Product implements java.io.Serializable { String name; String image; String[] regions; int numberSold; public Product(String name, String image, String[] regions, int numberSold) { this.name = name; this.image = image; this.regions = regions; this.numberSold = numberSold; } } 

要使用AnalyzingInfixSuggesterbuild方法索引记录,您需要向其传递一个实现org.apache.lucene.search.suggest.InputIterator接口的对象。 InputIterator可以访问每条记录的密钥上下文有效负载权重

关键是您实际要搜索的文本并自动完成。 在我们的示例中,它将是产品的名称。

上下文是一组可用于过滤记录的附加任意数据。 在我们的示例中,上下文是我们将特定产品发送到的国家/地区的ISO代码集。

有效负载是您要存储在记录索引中的其他任意数据。 在这个例子中,我们实际上将序列化每个Product实例并将结果字节存储为有效负载。 然后,当我们稍后进行查找时,我们可以反序列化有效负载并访问产品实例中的信息,如图像URL。

权重用于订购建议结果; 首先返回重量较高的结果。 我们将使用给定产品的销售数量作为其重量。

这是ProductIterator.java的内容:

 import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.ObjectOutputStream; import java.io.UnsupportedEncodingException; import java.util.Comparator; import java.util.HashSet; import java.util.Iterator; import java.util.Set; import org.apache.lucene.search.suggest.InputIterator; import org.apache.lucene.util.BytesRef; class ProductIterator implements InputIterator { private Iterator productIterator; private Product currentProduct; ProductIterator(Iterator productIterator) { this.productIterator = productIterator; } public boolean hasContexts() { return true; } public boolean hasPayloads() { return true; } public Comparator getComparator() { return null; } // This method needs to return the key for the record; this is the // text we'll be autocompleting against. public BytesRef next() { if (productIterator.hasNext()) { currentProduct = productIterator.next(); try { return new BytesRef(currentProduct.name.getBytes("UTF8")); } catch (UnsupportedEncodingException e) { throw new Error("Couldn't convert to UTF-8"); } } else { return null; } } // This method returns the payload for the record, which is // additional data that can be associated with a record and // returned when we do suggestion lookups. In this example the // payload is a serialized Java object representing our product. public BytesRef payload() { try { ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream out = new ObjectOutputStream(bos); out.writeObject(currentProduct); out.close(); return new BytesRef(bos.toByteArray()); } catch (IOException e) { throw new Error("Well that's unfortunate."); } } // This method returns the contexts for the record, which we can // use to restrict suggestions. In this example we use the // regions in which a product is sold. public Set contexts() { try { Set regions = new HashSet(); for (String region : currentProduct.regions) { regions.add(new BytesRef(region.getBytes("UTF8"))); } return regions; } catch (UnsupportedEncodingException e) { throw new Error("Couldn't convert to UTF-8"); } } // This method helps us order our suggestions. In this example we // use the number of products of this type that we've sold. public long weight() { return currentProduct.numberSold; } } 

在我们的驱动程序中,我们将执行以下操作:

  1. 在RAM中创建索引目录。
  2. 创建一个StandardTokenizer
  3. 使用RAM目录和标记生成器创建AnalyzingInfixSuggester。
  4. 使用ProductIterator索引许多ProductIterator
  5. 打印一些示例查找的结果。

这是驱动程序,SuggestProducts.java:

 import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.ObjectInputStream; import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.HashSet; import java.util.List; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester; import org.apache.lucene.search.suggest.Lookup; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.Version; public class SuggestProducts { // Get suggestions given a prefix and a region. private static void lookup(AnalyzingInfixSuggester suggester, String name, String region) { try { List results; HashSet contexts = new HashSet(); contexts.add(new BytesRef(region.getBytes("UTF8"))); // Do the actual lookup. We ask for the top 2 results. results = suggester.lookup(name, contexts, 2, true, false); System.out.println("-- \"" + name + "\" (" + region + "):"); for (Lookup.LookupResult result : results) { System.out.println(result.key); Product p = getProduct(result); if (p != null) { System.out.println(" image: " + p.image); System.out.println(" # sold: " + p.numberSold); } } } catch (IOException e) { System.err.println("Error"); } } // Deserialize a Product from a LookupResult payload. private static Product getProduct(Lookup.LookupResult result) { try { BytesRef payload = result.payload; if (payload != null) { ByteArrayInputStream bis = new ByteArrayInputStream(payload.bytes); ObjectInputStream in = new ObjectInputStream(bis); Product p = (Product) in.readObject(); return p; } else { return null; } } catch (IOException|ClassNotFoundException e) { throw new Error("Could not decode payload :("); } } public static void main(String[] args) { try { RAMDirectory index_dir = new RAMDirectory(); StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_48); AnalyzingInfixSuggester suggester = new AnalyzingInfixSuggester( Version.LUCENE_48, index_dir, analyzer); // Create our list of products. ArrayList products = new ArrayList(); products.add( new Product( "Electric Guitar", "http://sofzh.miximages.com/java/electric-guitar.jpg", new String[]{"US", "CA"}, 100)); products.add( new Product( "Electric Train", "http://sofzh.miximages.com/java/train.jpg", new String[]{"US", "CA"}, 100)); products.add( new Product( "Acoustic Guitar", "http://sofzh.miximages.com/java/acoustic-guitar.jpg", new String[]{"US", "ZA"}, 80)); products.add( new Product( "Guarana Soda", "http://sofzh.miximages.com/java/soda.jpg", new String[]{"ZA", "IE"}, 130)); // Index the products with the suggester. suggester.build(new ProductIterator(products.iterator())); // Do some example lookups. lookup(suggester, "Gu", "US"); lookup(suggester, "Gu", "ZA"); lookup(suggester, "Gui", "CA"); lookup(suggester, "Electric guit", "US"); } catch (IOException e) { System.err.println("Error!"); } } } 

这是驱动程序的输出:

 -- "Gu" (US): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 Acoustic Guitar image: http://images.example/acoustic-guitar.jpg # sold: 80 -- "Gu" (ZA): Guarana Soda image: http://images.example/soda.jpg # sold: 130 Acoustic Guitar image: http://images.example/acoustic-guitar.jpg # sold: 80 -- "Gui" (CA): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 -- "Electric guit" (US): Electric Guitar image: http://images.example/electric-guitar.jpg # sold: 100 

附录

有一种方法可以避免编写一个您可能更容易找到的完整InputIterator 。 您可以编写一个存根InputIterator ,它从next payloadcontexts方法返回null 。 将它的实例传递给AnalyzingInfixSuggesterbuild方法:

 suggester.build(new ProductIterator(new ArrayList().iterator())); 

然后,对于要索引的每个项目,请调用AnalyzingInfixSuggester add方法:

 suggester.add(text, contexts, weight, payload) 

在为所有内容编制索引之后,请调用refresh

 suggester.refresh(); 

如果您正在索引大量数据,则可以使用多个线程的此方法显着加快索引编制:调用build ,然后使用多个线程add项目,最后调用refresh

[编辑2015-04-23以演示来自LookupResult有效负载的反序列化信息。]