java httpurlconnection切断html
嘿,我正试图从推特个人资料页面获取html,但httpurlconnection只返回一小段html。 我的代码
for(int i = 0; i < urls.size(); i++) { URL url = new URL(urls.get(i)); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"); System.out.println(connection.getResponseCode()); String line; StringBuilder builder = new StringBuilder(); BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream())); while((line = reader.readLine()) != null) { builder.append(line); } String html = builder.toString(); }
我总是得到200作为每次通话的响应代码。 然而,大约1/3的时间返回整个html文档,而另一半只返回前几百行。 html被截止时返回的数量并不总是相同的。
有任何想法吗? 谢谢你的帮助!
附加信息:查看标题后,我似乎得到重复的内容长度标题。 第一个是全长,另一个是更短(并且可能代表我有时会得到的长度)如何处理重复的标题?
这对我来说很好,我在builder.append(line);
之后添加了换行符builder.append(line);
使其在控制台中更具可读性,但除此之外,它返回了此页面的所有HTML:
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.HttpURLConnection; import java.net.URL; import java.util.ArrayList; import java.util.List; public class RetrieveHTML { public static void main(String[] args) throws IOException { List urls = new ArrayList (); urls.add("http://stackoverflow.com/questions/3285077/java-httpurlconnection-cutting-off-html"); for (int i = 0; i < urls.size(); i++) { URL url = new URL(urls.get(i)); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"); System.out.println(connection.getResponseCode()); String line; StringBuilder builder = new StringBuilder(); BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream())); while ((line = reader.readLine()) != null) { builder.append(line); builder.append("\n"); } String html = builder.toString(); System.out.println("HTML " + html); } } }
- 如何在JAVA中保存来自HTTPSurl的文件?
- httpurlconnection线程安全
- Java HttpURLConnection InputStream.close()挂起(或工作时间太长?)
- jre8中URLPermission处的IllegalArgumentException
- 使用HttpURLConnection的HTTP请求不会重用TCP连接
- 如何杀死BufferedInputStream .read()调用
- HttpURLConnection实现
- 在Java中,有多接近连接并使用HttpURLConnection释放端口/套接字?
- 如何将JSON对象流式传输到HttpURLConnection POST请求