URLConnection不允许我访问有关Http错误的数据（404,500等）

我正在制作一个爬虫，并且需要从流中获取数据，无论它是否为200。 CURL正在这样做，以及任何标准浏览器。

以下实际上不会获取请求的内容，即使有一些，也会引发http错误状态代码的exception。我想要输出无论如何，有没有办法？我更喜欢使用这个库，因为它实际上会执行持久连接，这对于我正在进行的爬行类型来说是完美的。

package test; import java.net.*; import java.io.*; public class Test { public static void main(String[] args) { try { URL url = new URL("http://github.com/XXXXXXXXXXXXXX"); URLConnection connection = url.openConnection(); DataInputStream inStream = new DataInputStream(connection.getInputStream()); String inputLine; while ((inputLine = inStream.readLine()) != null) { System.out.println(inputLine); } inStream.close(); } catch (MalformedURLException me) { System.err.println("MalformedURLException: " + me); } catch (IOException ioe) { System.err.println("IOException: " + ioe); } } }

工作，谢谢：这就是我想出的 – 就像概念的粗略certificate：

 import java.net.*; import java.io.*; public class Test { public static void main(String[] args) { //InputStream error = ((HttpURLConnection) connection).getErrorStream(); URL url = null; URLConnection connection = null; String inputLine = ""; try { url = new URL("http://verelo.com/asdfrwdfgdg"); connection = url.openConnection(); DataInputStream inStream = new DataInputStream(connection.getInputStream()); while ((inputLine = inStream.readLine()) != null) { System.out.println(inputLine); } inStream.close(); } catch (MalformedURLException me) { System.err.println("MalformedURLException: " + me); } catch (IOException ioe) { System.err.println("IOException: " + ioe); InputStream error = ((HttpURLConnection) connection).getErrorStream(); try { int data = error.read(); while (data != -1) { //do something with data... //System.out.println(data); inputLine = inputLine + (char)data; data = error.read(); //inputLine = inputLine + (char)data; } error.close(); } catch (Exception ex) { try { if (error != null) { error.close(); } } catch (Exception e) { } } } System.out.println(inputLine); } }

简单：

 URLConnection connection = url.openConnection(); InputStream is = connection.getInputStream(); if (connection instanceof HttpURLConnection) { HttpURLConnection httpConn = (HttpURLConnection) connection; int statusCode = httpConn.getResponseCode(); if (statusCode != 200 /* or statusCode >= 200 && statusCode < 300 */) { is = httpConn.getErrorStream(); } }

您可以参考Javadoc进行解释。我要处理的最好方法如下：

 URLConnection connection = url.openConnection(); InputStream is = null; try { is = connection.getInputStream(); } catch (IOException ioe) { if (connection instanceof HttpURLConnection) { HttpURLConnection httpConn = (HttpURLConnection) connection; int statusCode = httpConn.getResponseCode(); if (statusCode != 200) { is = httpConn.getErrorStream(); } } }

调用openConnection后需要执行以下操作。

将URLConnection转换为HttpURLConnection
调用getResponseCode
如果响应成功，请使用getInputStream，否则使用getErrorStream

（成功的测试应该是200 <= code < 300因为除了200之外还有有效的HTTP成功代码。）

我正在制作一个爬虫，并且需要从流中获取数据，无论它是否为200。

请注意，如果代码是4xx或5xx，那么“数据”很可能是某种错误页面。

最后一点应该是您应该始终尊重“robots.txt”文件...并在抓取/抓取其所有者可能关心的网站内容之前阅读服务条款。简单地吹嘘GET请求可能会让网站所有者感到烦恼......除非你已经与他们达成某种“安排”。

URLConnection不允许我访问有关Http错误的数据（404,500等）

Xerces 2.11.0（Java）中特定于语言环境的消息

Java：这是对BCrypt的好用吗？

按文件名过滤eclipse中的警告

使Java servlet充当代理的代码？

switch语句中的最终变量大小写

正则表达式挂起 – Java匹配器

在JAX-RS中将JSON解组为Java POJO

HttpServletRequest getRemoteAddr（）无法正常工作

程序参数和VM参数之间有什么区别？

java中 .length的时间复杂度或隐藏成本