使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感

我正在寻找优化斯坦福CoreNLP情绪管道性能的方法。 因此，想要得到句子的情感，但只有那些包含特定关键词作为输入的句子。

我尝试了两种方法：

方法1：StanfordCoreNLP管道用情绪注释整个文本

我已经定义了一个注释器管道：tokenize，ssplit，parse，sentiment。我在整篇文章中运行它，然后在每个句子中查找关键字，如果它们存在，则运行返回关键字值的方法。虽然处理需要几秒钟，但我并不满意。

这是代码：

List keywords = ...; String text = ...; Map sentenceSentiment = new HashMap(); Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, parse, sentiment"); props.setProperty("parse.maxlen", "20"); props.setProperty("tokenize.options", "untokenizable=noneDelete"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); Annotation annotation = pipeline.process(text); // takes 2 seconds!!!! List sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class); for (int i=0; i<sentences.size(); i++) { CoreMap sentence = sentences.get(i); if(sentenceContainsKeywords(sentence,keywords) { int sentiment = RNNCoreAnnotations.getPredictedClass(sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class)); sentenceSentiment.put(sentence,sentiment); } }

方法2：StanfordCoreNLP管道用句子注释整个文本，分离在感兴趣的句子上运行的注释器

由于第一个解决方案的性能较弱，我已经定义了第二个解决方案。我已经使用注释器定义了一个管道：tokenize，ssplit。我在每个句子中查找了关键字，如果它们存在，我只为这个句子创建了一个注释，并在其上运行下一个注释器：ParserAnnotator，BinarizerAnnotator和SentimentAnnotator。

由于ParserAnnotator，结果真的不令人满意。即使我用相同的属性初始化它。在方法1中，有时花费的时间比整个管道在文档上运行的时间更长。

 List keywords = ...; String text = ...; Map sentenceSentiment = new HashMap(); Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit"); // parsing, sentiment removed props.setProperty("parse.maxlen", "20"); props.setProperty("tokenize.options", "untokenizable=noneDelete"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // initiation of annotators to be run on sentences ParserAnnotator parserAnnotator = new ParserAnnotator("pa", props); BinarizerAnnotator binarizerAnnotator = new BinarizerAnnotator("ba", props); SentimentAnnotator sentimentAnnotator = new SentimentAnnotator("sa", props); Annotation annotation = pipeline.process(text); // takes <100 ms List sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class); for (int i=0; i<sentences.size(); i++) { CoreMap sentence = sentences.get(i); if(sentenceContainsKeywords(sentence,keywords) { // code required to perform annotation on one sentence List listWithSentence = new ArrayList(); listWithSentence.add(sentence); Annotation sentenceAnnotation = new Annotation(listWithSentence); parserAnnotator.annotate(sentenceAnnotation); // takes 50 ms up to 2 seconds!!!! binarizerAnnotator.annotate(sentenceAnnotation); sentimentAnnotator.annotate(sentenceAnnotation); sentence = sentenceAnnotation.get(CoreAnnotations.SentencesAnnotation.class).get(0); int sentiment = RNNCoreAnnotations.getPredictedClass(sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class)); sentenceSentiment.put(sentence,sentiment); } }

问题

我想知道为什么在CoreNLP中解析不是“懒惰”？（在我的例子中，这意味着：只有在调用句子的情绪时才会执行）。是出于表现原因吗？
为什么一个句子的解析器几乎和整个文章的解析器一样长（我的文章有7个句子）？是否可以以更快的方式配置它？

如果您希望加快选区解析，那么单一的最佳改进是使用新的shift-reduce选区解析器。它比默认的PCFG解析器快几个数量级。

您以后的问题的答案：

为什么CoreNLP解析不是懒惰的？ 这当然是可能的，但不是我们已经在管道中实现的东西。在必要的情况下，我们可能没有在内部看到很多用例。如果你有兴趣制作一个“懒惰的注释器包装器”，我们将很乐意接受！
为什么一个句子的解析器几乎和整个文章的解析器一样长？ 默认的Stanford PCFG解析器是相对于句子长度的立方时间复杂度。这就是我们通常建议出于性能原因限制最大句子长度的原因。另一方面，shift-reduce解析器相对于句子长度以线性时间运行。

使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感

编译时代码中是否替换了Java静态最终值？

如何使用Flexjson JSONDeserializer？

向Gradle添加其他测试套件

如何使用JAXB从服务返回的’anyType’创建java对象？

线程“main”中的exceptionjava.lang.NoClassDefFoundError：名称错误

Lotus Notes Java应用程序找不到notes.ini

以编程方式授予权限，而不使用策略文件

Java：如何以相反的顺序对浮点数组进行排序？

List根元素的XStream别名

如何使用jetty正确支持html5 源