如何判断句子是否是一个问题(疑问句)?

是否有一个开源Java库/算法用于查找特定文本是否是一个问题?
我正在研究一个问题回答系统,需要分析用户输入的文本是否是一个问题。
我认为问题可以通过使用开源NLP库来解决,但它显然比简单的词性标记更复杂。 因此,如果有人可以通过使用现有的开源NLP库来告诉算法,那也不错。
如果您知道使用数据挖掘来解决此问题的库/工具包,请告诉我。 虽然很难获得足够的数据用于培训目的,但我将能够使用堆栈交换数据进行培训。

在一个问题的句法分析中,正确的结构将是以下forms:

 (SBARQ (WH+ (W+) ...) (SQ ...* (V+) ...*) (?)) 

因此,使用任何可用的语法分析器,具有嵌入式SQ(可选)的SBARQ节点的树将是输入是问题的指示符。 WH +节点(WHNP / WHADVP / WHADJP)包含问题词干(who / what / when / where / why / how),SQ保存倒置短语。

即:

 (SBARQ (WHNP (WP What)) (SQ (VBZ is) (NP (DT the) (NN question))) (. ?)) 

当然,有很多先行条款会导致解析错误(可以解决),问题写得非常糟糕。 例如,这篇文章的标题“如何判断一个句子是一个问题?” 将有一个SBARQ,但不是SQ。

许多准问题/信息请求都是以语句的语法forms提出的; 例如“我想知道谁偷了我的自行车”。

我会放弃所有希望从其结构中确定用户的输入是否是一个问题,并简单地假设它是一个问题,除非它明确地不是一个问题。 您可以采用迭代的交互式方法,以便系统可以优化其对用户输入的“理解”:

  User: I would like to know how many angels fit on the head of a pin. System: Do you have a question about angels? User: Yes. System: Do you want to know if angels are fit? User: No. System: Do you want to know if angels have heads? User: Possibly. System: Do you want to know if angels have pins? User: No. System: Do you want to know if angels are numerous? User: No. System: Do you want to know the dimensions of an angel? User: Yes. System: Do you mean angels in paintings? User: No. System: Do you mean angels in myth and religious writing? User: Yes. System: Angels are metaphysical beings. User: I hear that Pennsylvania was founded by William Penn. Is that true? System: Do you have a question about hearing? User: No. System: Do you have a question about Pennsylvania? User: Yes. System: Pennsylvania was founded by William Penn. User: When? System: 1682. User: What does the name mean? System: What name? User: Pennsylvania! System: Do you want to know the meaning of Pennsylvania? User: Yes. System: Pennsylvania means Penn's Woods. 

看看Link Grammar Parser它是一个多语言解析器,基于句子中相关单词的链接对的概念。 它是用C语言编写的,但也有Java JNI接口。