如何使用apache poi获取doc，docx文件中特定单词的行号，页码？

我正在尝试创建一个java application ，它将搜索所选doc, docx文件中的特定单词并生成一个报告。该报告将包含页码和搜索单词的行号。现在我所取得的成就是我能够逐段阅读doc和docx文件。但我没有找到任何方法来搜索特定的单词并获得该单词所在的行和页码。我搜索了很多，但直到现在都没有运气。希望有人知道这样做的方法。

这是我的代码

 if(fc.getSelectedFile().getAbsolutePath().contains("docx")) { File file = fc.getSelectedFile(); FileInputStream fis = new FileInputStream(file.getAbsolutePath()); XWPFDocument document = new XWPFDocument(fis); List paragraphs = document.getParagraphs(); System.out.println("Total no of paragraph "+paragraphs.size()); for (XWPFParagraph para : paragraphs) { System.out.println(para.getText()); } fis.close(); } else { WordExtractor extractor = null; FileInputStream fis = new FileInputStream(fc.getSelectedFile()); HWPFDocument document = new HWPFDocument(fis); extractor = new WordExtractor(document); String[] fileData = extractor.getParagraphText(); for (int i = 0; i < fileData.length; i++) { if (fileData[i] != null) System.out.println(fileData[i]); } extractor.close(); }

我正在使用swing ， apache poi 3.10.1.

恐怕没有简单的方法可以做到这一点。不存储行号和页码，而是根据指定的页面大小，根据文本布局快速计算。该页面定义了文本中的包装位置。

您可以尝试使用适当的EditorKit在JEditorPane中加载文档来实现该function（例如，参见DocxEditorKit实现的尝试http://java-sl.com/docx_editor_kit.html它提供了基本function，您可以尝试实现这里基于源代码和想法拥有EditorKit）。

该工具包应支持分页以呈现页面（请参阅此处有关分页的文章http://java-sl.com/articles.html ）

分页完成后，您可以找到单词的位置（插入符号偏移量）并获取行/列（请参阅http://java-sl.com/tip_row_column.html ）。

如何使用apache poi获取doc，docx文件中特定单词的行号，页码？

如何将.docx的段落复制到另一个.docx withJava并保留样式

使用MessageDigest SHA-256的POI XSSF / XLSX散列不确定性

无法使用ApachePOI打开Excel – 获取exception

java.lang.OutOfMemoryError：使用Apache POI读取excel时的Java堆空间

使用Apache POI进行低内存写入/读取

从java POI创建excel文件时出错

无法使用Apache POI获取/设置Word文档（.doc）中的复选框值

使用Apache POI基于csv表更新Excel工作表值

多个样式到Excel单元格POI

使用POI写入现有xls文件