如何使用Stanford Parser解析英语以外的语言？在java中，而不是命令行

我一直试图在我的Java程序中使用Stanford Parser来解析一些中文句子。由于我在Java和Stanford Parser都很新，我使用’ParseDemo.java’来练习。该代码适用于英语句子并输出正确的结果。但是，当我将模型更改为’chinesePCFG.ser.gz’并尝试解析一些分段的中文句子时，出现了问题。

这是我在Java中的代码

class ParserDemo { public static void main(String[] args) { LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz"); if (args.length > 0) { demoDP(lp, args[0]); } else { demoAPI(lp); } } public static void demoDP(LexicalizedParser lp, String filename) { // This option shows loading and sentence-segment and tokenizing // a file using DocumentPreprocessor TreebankLanguagePack tlp = new PennTreebankLanguagePack(); GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); // You could also create a tokenier here (as below) and pass it // to DocumentPreprocessor for (List sentence : new DocumentPreprocessor(filename)) { Tree parse = lp.apply(sentence); parse.pennPrint(); System.out.println(); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); Collection tdl = gs.typedDependenciesCCprocessed(true); System.out.println(tdl); System.out.println(); } } public static void demoAPI(LexicalizedParser lp) { // This option shows parsing a list of correctly tokenized words String sent[] = { "我", "是", "一名", "学生" }; List rawWords = Sentence.toCoreLabelList(sent); Tree parse = lp.apply(rawWords); parse.pennPrint(); System.out.println(); TreebankLanguagePack tlp = new PennTreebankLanguagePack(); GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(); GrammaticalStructure gs = gsf.newGrammaticalStructure(parse); List tdl = gs.typedDependenciesCCprocessed(); System.out.println(tdl); System.out.println(); TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed"); tp.printTree(parse); } private ParserDemo() {} // static methods only }

它与ParserDemo.java基本相同，但是当我运行它时，我得到以下结果：

从序列化文件中加载解析器edu / stanford / nlp / models / lexparser / chinesePCFG.ser.gz …完成[2.2秒]。（RO（IP（NP（PN我））（VP（VC是）（NP（QP（CD一名））（NP（NN学生））））））

线程“main”中的exceptionjava.lang.RuntimeException：无法在edu.stanford.nlp.trees.GrammaticalStructureFactory.newGrammaticalStructure（GrammaticalStructureFactory）中调用public edu.stanford.nlp.trees.EnglishGrammaticalStructure（edu.stanford.nlp.trees.Tree） .java：104）at parserdemo.ParserDemo.demoAPI（ParserDemo.java:65）at parserdemo.ParserDemo.main（ParserDemo.java:23）

第65行的代码是：

  GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);

我的猜测是chinesePCFG.ser.gz错过了与’edu.stanford.nlp.trees.EnglishGrammaticalStructure’相关的内容。由于解析器通过命令行正确解析中文，因此我自己的代码一定有问题。我一直在寻找，但只发现了一些类似的案例，其中一些提到了使用正确的模型，但我真的不知道如何将代码修改为“正确的模型”。希望有人能帮助我。我是Java和Stanford Parser的新手，所以请具体说明。谢谢！

问题是GrammaticalStructureFactory是由PennTreebankLanguagePack构建的，它用于英语Penn PennTreebankLanguagePack 。你需要使用（在两个地方）

 TreebankLanguagePack tlp = new ChineseTreebankLanguagePack();

并适当地导入

 import edu.stanford.nlp.trees.international.pennchinese.ChineseTreebankLanguagePack;

但我们通常也建议对中文使用因式解析器（因为它的工作效果要好得多，与英语不同，但代价是更多的内存和时间使用）

 LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/chineseFactored.ser.gz");

如何使用Stanford Parser解析英语以外的语言？在java中，而不是命令行

将Launch4J配置为仅使用32位JVM

mockito如何创建模拟对象的实例

使用Stanford CoreNLP进行懒惰解析，以获得特定句子的情感

如何使复杂条件看起来不错并保存语句数量？

为什么shortValue（）方法是具体的，但intValue（）是抽象到java.lang.Number？

Spring @Autowiring，如何使用对象工厂来选择实现？

更新到Apple Java for Mac OS X 10.6 update 6后，sqljdbc4挂起

你能直接将pdf打印到斑马打印机吗？

如何查找字符串是否包含“仅”特殊字符在java中

替换字符串中的变量占位符