如何使用Open nlp的分块解析器提取名词短语

我是自然语言处理的新手。我需要从文本中提取名词短语。到目前为止,我已经使用open nlp的分块解析器来解析我的文本以获得树结构。但是我无法从中提取名词短语。树结构,在打开的nlp中是否有任何正则表达式模式,以便我可以用它来提取名词短语。

下面是我正在使用的代码

InputStream is = new FileInputStream("en-parser-chunking.bin"); ParserModel model = new ParserModel(is); Parser parser = ParserFactory.create(model); Parse topParses[] = ParserTool.parseLine(line, parser, 1); for (Parse p : topParses){ p.show();} 

在这里,我得到的输出为

(TOP(S(S(ADJP(JJ欢迎)(PP(TO to)(NP(NNP Big)(NNP Data。)))))(S(NP(PRP We))(VP(VP(VBP)) (VP(VBG工作)(PP(IN)(NP(NNP自然)(NNP语言)(NNP Processing.can)))))(NP(DT some)(CD one)(NN帮助))(NP( PRP us))(PP(IN in)(S(VP(VBG提取)(NP(DT)(NN名词)(NNS短语))(PP(IN))(NP(DT))(NN树)( WP结构。))))))))))

有人可以帮助我获取像NP,NNP,NN等名词短语。可以告诉我,我是否需要使用任何其他NP Chunker来获取名词短语?是否有任何正则表达式模式来实现相同的目的。

请帮帮我。

提前致谢

Gouse。

Parse对象是一棵树; 您可以使用getParent()getChildren()以及getType()来导航树。

 List nounPhrases; public void getNounPhrases(Parse p) { if (p.getType().equals("NP")) { nounPhrases.add(p); } for (Parse child : p.getChildren()) { getNounPhrases(child); } } 

如果你只想要名词短语,那么使用句子chunker而不是树解析器。 代码是这样的(你需要从你获得解析器模型的同一个地方获取模型)

 public void chunk() { InputStream modelIn = null; ChunkerModel model = null; try { modelIn = new FileInputStream("en-chunker.bin"); model = new ChunkerModel(modelIn); } catch (IOException e) { // Model loading failed, handle the error e.printStackTrace(); } finally { if (modelIn != null) { try { modelIn.close(); } catch (IOException e) { } } } //After the model is loaded a Chunker can be instantiated. ChunkerME chunker = new ChunkerME(model); String sent[] = new String[]{"Rockwell", "International", "Corp.", "'s", "Tulsa", "unit", "said", "it", "signed", "a", "tentative", "agreement", "extending", "its", "contract", "with", "Boeing", "Co.", "to", "provide", "structural", "parts", "for", "Boeing", "'s", "747", "jetliners", "."}; String pos[] = new String[]{"NNP", "NNP", "NNP", "POS", "NNP", "NN", "VBD", "PRP", "VBD", "DT", "JJ", "NN", "VBG", "PRP$", "NN", "IN", "NNP", "NNP", "TO", "VB", "JJ", "NNS", "IN", "NNP", "POS", "CD", "NNS", "."}; String tag[] = chunker.chunk(sent, pos); } 

然后查看所需类型的标签数组

 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.parser.chunking.api 

将从您的代码本身继续。 该程序块将提供句子中的所有名词短语。 使用getTagNodes()方法获取Tokens及其类型

 Parse topParses[] = ParserTool.parseLine(line, parser, 1); Parse words[]=null; //an array to store the tokens //Loop thorugh to get the tag nodes for (Parse nodes : topParses){ words=nodes.getTagNodes(); // we will get a list of nodes } for(Parse word:words){ //Change the types according to your desired types if(word.getType().equals("NN") || word.getType().equals("NNP") || word.getType().equals("NNS")){ System.out.println(word); } }