通过Apache POI读取大型Excel文件(xlsx)时出错

我试图通过Apache POI读取大型excel文件xlsx,比如40-50 MB。 我失去了内存exception。 当前堆内存为3GB。

我可以毫无问题地阅读较小的excel文件。 我需要一种方法来读取大型excel文件,然后通过Spring excel视图将它们作为响应返回。

public class FetchExcel extends AbstractView { @Override protected void renderMergedOutputModel( Map model, HttpServletRequest request, HttpServletResponse response) throws Exception { String fileName = "SomeExcel.xlsx"; response.setContentType("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"); OPCPackage pkg = OPCPackage.open("/someDir/SomeExcel.xlsx"); XSSFWorkbook workbook = new XSSFWorkbook(pkg); ServletOutputStream respOut = response.getOutputStream(); pkg.close(); workbook.write(respOut); respOut.flush(); workbook = null; response.setHeader("Content-disposition", "attachment;filename=\"" +fileName+ "\""); } } 

我首先开始使用XSSFWorkbook workbook = new XSSFWorkbook(FileInputStream in); 但是每个Apache POI API的成本很高,所以我切换到OPC封装方式,但效果仍然相同。 我不需要解析或处理文件,只需读取并返回即可。

您没有提及是否需要修改电子表格。

这可能是显而易见的,但是如果您不需要修改电子表格,那么您不需要解析它并将其写回来,您只需从文件中读取字节,然后写出字节,就像您一样,说图像,或任何其他二进制格式。

如果您确实需要在将电子表格发送给用户之前对其进行修改,那么据我所知,您可能需要采取不同的方法。

我知道用Java读取Excel文件的每个库都将整个电子表格读入内存,因此每个可能同时处理的电子表格必须有50MB的内存。 正如其他人指出的那样,这涉及调整VM可用的堆。

如果您需要同时处理大量电子表格,并且无法分配足够的内存,请考虑使用可以流式传输的格式,而不是一次性读取到内存中。 可以通过Excel打开CSV格式,过去通过将内容类型设置为application / vnd.ms-excel,将附件文件名设置为以“.xls”结尾的内容,但实际上返回CSV,我已经取得了良好的效果内容。 我没有在几年内试过这个,所以YMMV。

下面是使用sax解析器读取大型xls文件的示例。

 public void parseExcel(File file) throws IOException { OPCPackage container; try { container = OPCPackage.open(file.getAbsolutePath()); ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(container); XSSFReader xssfReader = new XSSFReader(container); StylesTable styles = xssfReader.getStylesTable(); XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData(); while (iter.hasNext()) { InputStream stream = iter.next(); processSheet(styles, strings, stream); stream.close(); } } catch (InvalidFormatException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } catch (OpenXML4JException e) { e.printStackTrace(); } } protected void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, InputStream sheetInputStream) throws IOException, SAXException { InputSource sheetSource = new InputSource(sheetInputStream); SAXParserFactory saxFactory = SAXParserFactory.newInstance(); try { SAXParser saxParser = saxFactory.newSAXParser(); XMLReader sheetParser = saxParser.getXMLReader(); ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() { @Override public void startRow(int rowNum) { } @Override public void endRow() { } @Override public void cell(String cellReference, String formattedValue) { } @Override public void headerFooter(String text, boolean isHeader, String tagName) { } }, false//means result instead of formula ); sheetParser.setContentHandler(handler); sheetParser.parse(sheetSource); } catch (ParserConfigurationException e) { throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage()); } 

在解析xlsx文件时我也遇到了同样的OOM问题……经过两天的挣扎,我终于找到了下面非常完美的代码;

此代码基于sjxlsx。 它读取xlsx并存储在HSSF表中。

  [code=java] // read the xlsx file SimpleXLSXWorkbook = new SimpleXLSXWorkbook(new File("C:/test.xlsx")); HSSFWorkbook hsfWorkbook = new HSSFWorkbook(); org.apache.poi.ss.usermodel.Sheet hsfSheet = hsfWorkbook.createSheet(); Sheet sheetToRead = workbook.getSheet(0, false); SheetRowReader reader = sheetToRead.newReader(); Cell[] row; int rowPos = 0; while ((row = reader.readRow()) != null) { org.apache.poi.ss.usermodel.Row hfsRow = hsfSheet.createRow(rowPos); int cellPos = 0; for (Cell cell : row) { if(cell != null){ org.apache.poi.ss.usermodel.Cell hfsCell = hfsRow.createCell(cellPos); hfsCell.setCellType(org.apache.poi.ss.usermodel.Cell.CELL_TYPE_STRING); hfsCell.setCellValue(cell.getValue()); } cellPos++; } rowPos++; } return hsfSheet;[/code] 

在bellwo示例中,我将添加一个完整的代码,如何将完整的excel文件(对于我60Mo)解析为对象列表而没有任何“ 内存不足 ”的问题并且正常工作:

 import java.util.ArrayList; import java.util.List; class DistinctByProperty { private static OPCPackage xlsxPackage = null; private static PrintStream output= System.out; private static List resultMapping = new ArrayList<>(); public static void main(String[] args) throws IOException { File file = new File("C:\\Users\\aberguig032018\\Downloads\\your_excel.xlsx"); double bytes = file.length(); double kilobytes = (bytes / 1024); double megabytes = (kilobytes / 1024); System.out.println("Size "+megabytes); parseExcel(file); } public static void parseExcel(File file) throws IOException { try { xlsxPackage = OPCPackage.open(file.getAbsolutePath(), PackageAccess.READ); ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(xlsxPackage); XSSFReader xssfReader = new XSSFReader(xlsxPackage); StylesTable styles = xssfReader.getStylesTable(); XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData(); int index = 0; while (iter.hasNext()) { try (InputStream stream = iter.next()) { String sheetName = iter.getSheetName(); output.println(); output.println(sheetName + " [index=" + index + "]:"); processSheet(styles, strings, new MappingFromXml(resultMapping), stream); } ++index; } } catch (InvalidFormatException e) { e.printStackTrace(); } catch (OpenXML4JException e) { e.printStackTrace(); } catch (SAXException e) { e.printStackTrace(); } } private static void processSheet(StylesTable styles, ReadOnlySharedStringsTable strings, MappingFromXml mappingFromXml, InputStream sheetInputStream) throws IOException, SAXException { DataFormatter formatter = new DataFormatter(); InputSource sheetSource = new InputSource(sheetInputStream); try { XMLReader sheetParser = SAXHelper.newXMLReader(); ContentHandler handler = new XSSFSheetXMLHandler( styles, null, strings, mappingFromXml, formatter, false); sheetParser.setContentHandler(handler); sheetParser.parse(sheetSource); System.out.println("Size of Array "+resultMapping.size()); } catch(ParserConfigurationException e) { throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage()); } } } 

你必须添加一个实现的calss

SheetContentsHandler

 import com.sun.org.apache.xpath.internal.operations.Bool; import org.apache.poi.ss.util.CellAddress; import org.apache.poi.ss.util.CellReference; import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.SheetContentsHandler; import org.apache.poi.xssf.usermodel.XSSFComment; import java.io.PrintStream; import java.util.ArrayList; import java.util.List; public class MappingFromXml implements SheetContentsHandler { private List result = new ArrayList<>(); private myObject myObject = null; private int lineNumber = 0; /** * Number of columns to read starting with leftmost */ private int minColumns = 25; /** * Destination for data */ private PrintStream output = System.out; public MappingFromXml(List list) { this.result = list; } @Override public void startRow(int i) { output.println("iii " + i); lineNumber = i; myObject = new myObject(); } @Override public void endRow(int i) { output.println("jjj " + i); result.add(myObject); myObject = null; } @Override public void cell(String cellReference, String formattedValue, XSSFComment comment) { int columnIndex = (new CellReference(cellReference)).getCol(); if(lineNumber > 0){ switch (columnIndex) { case 0: {//Tech id if (formattedValue != null && !formattedValue.isEmpty()) myObject.setId(Integer.parseInt(formattedValue)); } break; //TODO add other cell } } } @Override public void headerFooter(String s, boolean b, String s1) { } } 

有关更多信息,请访问此链接