在通过罗马解析RSS提要时,prolog中不允许获取内容

使用Rome API解析RSS提要我收到此错误:

com.sun.syndication.io.ParsingFeedException: Invalid XML at com.sun.syndication.io.WireFeedInput.build(WireFeedInput.java:210) 

代码如下:

 public static void main(String[] args) { URL url; XmlReader reader = null; SyndFeed feed; try { url = new URL("https://www.democracynow.org/podcast.xml"); reader = new XmlReader(url); feed = new SyndFeedInput().build(reader); for (Iterator i =feed.getEntries().iterator(); i.hasNext();) { SyndEntry entry = i.next(); System.out.println(entry.getPublishedDate()+" Title "+entry.getTitle()); } } catch (Exception e) { e.printStackTrace(); } } 

我检查了一些链接,如:

http://old.nabble.com/Invalid-XML:-Error-on-line-1:-Content-is-not-allowed-in-prolog.-td21258868.html

问题可能是charsets,但我无法想办法实现这个问题。 任何帮助或指导都会非常感激。

感谢致敬,

Vaibhav Goswami

我也在使用Syndication,我可以获得发布日期和标题。

我的代码如下:

 URL feedUrl = new URL("http://www.bloomberg.com/tvradio/podcast/cat_markets.xml"); SyndFeedInput input = new SyndFeedInput(); SyndFeed feed = input.build(new XmlReader(feedUrl)); for (Iterator i = feed.getEntries().iterator(); i.hasNext();) { SyndEntry entry = (SyndEntry) i.next(); System.out.println("title |"+entry.getTitle()+" " -timeStamp "+entry.getPublishedDate()"\n") } 

这是有效的,我使用Bloomberg Url只是因为它给了我一个XML。

如果您的查询是别的,请告诉我:)

您可以使用SyndFeedSyndEntry来解析xml

您还需要检查xml是否有效

 URL url = new URL("http://feeds.feedburner.com/javatipsfeed"); XmlReader reader = null; try { reader = new XmlReader(url); SyndFeed feeder = new SyndFeedInput().build(reader); System.out.println("Feed Title: "+ feeder.getAuthor()); for (Iterator i = feeder.getEntries().iterator(); i.hasNext();) { SyndEntry syndEntry = (SyndEntry) i.next(); System.out.println(syndEntry.getTitle()); } } finally { if (reader != null) reader.close(); } 

这是由于字节顺序标记问题 。 这是一个JUnit测试用例,演示了问题和修复:

 package rss; import org.xml.sax.InputSource; import java.io.*; import java.net.*; import com.sun.syndication.io.*; import org.apache.commons.io.IOUtils; import org.apache.commons.io.input.BOMInputStream; import org.junit.Test; public class RssEncodingTest { String url = "http://www.moneydj.com/KMDJ/RssCenter.aspx?svc=NH&fno=1&arg=X0000000"; // This works because we use InputSource direct from the UrlConnection's InputStream @Test public void test01() throws MalformedURLException, IOException, IllegalArgumentException, FeedException { try (InputStream is = new URL(url).openConnection().getInputStream()) { InputSource source = new InputSource(is); System.out.println("description: " + new SyndFeedInput().build(source).getDescription()); } } // But a String input fails because the byte order mark problem @Test public void test02() throws MalformedURLException, IOException, IllegalArgumentException, FeedException { String html = IOUtils.toString(new URL(url).openConnection() .getInputStream()); Reader reader = new StringReader(html); System.out.println("description: " + new SyndFeedInput().build(reader).getDescription()); } // We can use Apache Commons IO to fix the byte order mark @Test public void test03() throws MalformedURLException, IOException, IllegalArgumentException, FeedException { String html = IOUtils.toString(new URL(url).openConnection() .getInputStream()); try (BOMInputStream bomIn = new BOMInputStream( IOUtils.toInputStream(html))) { String f = IOUtils.toString(bomIn); Reader reader = new StringReader(f); System.out.println("description: " + new SyndFeedInput().build(reader).getDescription()); } } }