在Java中处理HTTP调用的大文件

我有一个包含数百万行的文件，我需要处理它。该文件的每一行都将导致HTTP调用。我正试图找出解决问题的最佳方法。

我显然可以只读取文件并按顺序拨打电话，但速度会非常慢。我想并行化调用，但我不确定是否应该将整个文件读入内存（我不是很喜欢的东西）或尝试并行化文件的读取（我是我不确定是否有意义）。

只是在这里寻找一些关于解决问题的最佳方法的想法。如果有一个类似的东西的现有框架或库我也很乐意使用它。

谢谢。

我想并行化调用，但我不确定是否应该将整个文件读入内存

您应该使用具有有界BlockingQueue的ExecutorService 。当您阅读百万行时，您将作业提交到线程池，直到BlockingQueue已满。这样，您就可以同时运行100个（或任何数量最佳的）HTTP请求，而无需事先读取文件的所有行。

您需要设置一个RejectedExecutionHandler ，阻止队列是否已满。这比调用者运行处理程序更好。

 BlockingQueue queue = new ArrayBlockingQueue(100); // NOTE: you want the min and max thread numbers here to be the same value ThreadPoolExecutor threadPool = new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue); // we need our RejectedExecutionHandler to block if the queue is full threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() { @Override public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) { try { // this will block the producer until there's room in the queue executor.getQueue().put(r); } catch (InterruptedException e) { throw new RejectedExecutionException( "Unexpected InterruptedException", e); } } }); // now read in the urls while ((String url = urlReader.readLine()) != null) { // submit them to the thread-pool. this may block. threadPool.submit(new DownloadUrlRunnable(url)); } // after we submit we have to shutdown the pool threadPool.shutdown(); // wait for them to complete threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS); ... private class DownloadUrlRunnable implements Runnable { private final String url; public DownloadUrlRunnable(String url) { this.url = url; } public void run() { // download the URL } }

格雷的方法似乎很好。我建议的另一种方法是将文件拆分为块（您必须编写逻辑），并处理具有多个线程的文件。

在Java中处理HTTP调用的大文件

如何在java中的http post中发送json对象

是否可以使用HTTP POST下载文件？

有没有办法从Java servlet处理程序获取原始http请求流？

有时HttpURLConnection.getInputStream的执行速度太慢

Java Http Client通过POST上传文件

如何比较java中的两个URL？

Java applet无法在Java8 / HTTPS上加载

找不到Java 9 Zip End Headerexception

HTTP查询参数和HTTP表单参数之间的区别？

如何从http或https请求获取带端口的主机名