提取HTML标记之外的文本

我有以下HTML代码：

Text #1
 "Another Text 1" Text #2
 "Another Text 2"

我想提取标签外的文本，“另一个文本1”和“另一个文本2”

我正在使用JSoup来实现这一目标。

有任何想法吗？？？

谢谢！

您可以选择每个div -tag的下一个Node （而不是Element ！）。在你的例子中，它们都是TextNode的。

 final String html = "Text #1
 \"Another Text 1\"\n" + "Text #2
 \"Another Text 2\" "; Document doc = Jsoup.parse(html); for( Element element : doc.select("div.example") ) // Select all the div tags { TextNode next = (TextNode) element.nextSibling(); // Get the next node of each div as a TextNode System.out.println(next.text()); // Print the text of the TextNode }

输出：

  "Another Text 1" "Another Text 2"

一种解决方案是使用ownText()方法（请参阅Jsoup 文档）。此方法仅返回指定元素所拥有的文本，并忽略其直接子元素所拥有的任何文本。

仅使用您提供的html，您可以提取 owntext：

 String html = "Text #1
 'Another Text 1'Text #2
 'Another Text 2'"; Document doc = Jsoup.parse(html); System.out.println(doc.body().ownText());

将输出：

 'Another Text 1' 'Another Text 2'

请注意， ownText()方法可用于任何Element 。文档中还有另一个例子。

提取HTML标记之外的文本

maven android插件与android支持库v7

自签名证书

Android OpenCV简单形状检测应用程序圈错误

使用动态添加的元素进行数据绑定

如何让这个rxjava zip并行运行？

无法启动活动？

android java.io.File.fixSlashes（File.java:185）

我可以在Android上的多个线程中使用相同的RoomDatabase对象吗？

如何将android电子邮件源代码导入eclipse项目？

如何在Android开发中将jchar转换为JNi中的char？