PDF到字节数组，反之亦然

我需要将pdf转换为字节数组，反之亦然。

谁能帮我？

这就是我转换为字节数组的方式

public static byte[] convertDocToByteArray(String sourcePath) { byte[] byteArray=null; try { InputStream inputStream = new FileInputStream(sourcePath); String inputStreamToString = inputStream.toString(); byteArray = inputStreamToString.getBytes(); inputStream.close(); } catch (FileNotFoundException e) { System.out.println("File Not found"+e); } catch (IOException e) { System.out.println("IO Ex"+e); } return byteArray; }

如果我使用以下代码将其转换回文档，则会创建pdf。但是它说的是'Bad Format. Not a pdf' 'Bad Format. Not a pdf' 。

 public static void convertByteArrayToDoc(byte[] b) { OutputStream out; try { out = new FileOutputStream("D:/ABC_XYZ/1.pdf"); out.close(); System.out.println("write success"); }catch (Exception e) { System.out.println(e); }

您基本上需要一个帮助方法来将流读入内存。这非常有效：

 public static byte[] readFully(InputStream stream) throws IOException { byte[] buffer = new byte[8192]; ByteArrayOutputStream baos = new ByteArrayOutputStream(); int bytesRead; while ((bytesRead = stream.read(buffer)) != -1) { baos.write(buffer, 0, bytesRead); } return baos.toByteArray(); }

然后你打电话给：

 public static byte[] loadFile(String sourcePath) throws IOException { InputStream inputStream = null; try { inputStream = new FileInputStream(sourcePath); return readFully(inputStream); } finally { if (inputStream != null) { inputStream.close(); } } }

不要混淆文本和二进制数据 – 它只会导致眼泪。

Java 7引入了Files.readAllBytes() ，它可以将PDF读入byte[]如下所示：

 import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.Files; Path pdfPath = Paths.get("/path/to/file.pdf"); byte[] pdf = Files.readAllBytes(pdfPath);

编辑：

感谢Farooque指出：这将适用于阅读任何类型的文件，而不仅仅是PDF。所有文件最终只是一堆字节，因此可以读入byte[] 。

问题是你在InputStream对象本身上调用toString() 。这将返回InputStream对象的String表示forms，而不是实际的PDF文档。

您只想将PDF作为字节读取，因为PDF是二进制格式。然后，您将能够写出相同的byte数组，并且它将是一个有效的PDF，因为它尚未被修改。

例如，将文件作为字节读取

 File file = new File(sourcePath); InputStream inputStream = new FileInputStream(file); byte[] bytes = new byte[file.length()]; inputStream.read(bytes);

您可以使用Apache Commons IO而无需担心内部细节。

使用org.apache.commons.io.FileUtils.readFileToByteArray(File file)返回byte[]类型的数据。

点击这里查看Javadoc

你不是在创建pdf文件但实际上没有写回字节数组吗？因此您无法打开PDF。

 out = new FileOutputStream("D:/ABC_XYZ/1.pdf"); out.Write(b, 0, b.Length); out.Position = 0; out.Close();

这是正确读取PDF到字节数组的补充。

在InputStream上调用toString()不会按照您的想法执行。即使它确实如此，PDF也包含二进制数据，因此您不希望首先将其转换为字符串。

您需要做的是从流中读取，将结果写入ByteArrayOutputStream ，然后通过调用toByteArray()将ByteArrayOutputStream转换为实际的byte数组：

 InputStream inputStream = new FileInputStream(sourcePath); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); int data; while( (data = inputStream.read()) >= 0 ) { outputStream.write(data); } inputStream.close(); return outputStream.toByteArray();

 public static void main(String[] args) throws FileNotFoundException, IOException { File file = new File("java.pdf"); FileInputStream fis = new FileInputStream(file); //System.out.println(file.exists() + "!!"); //InputStream in = resource.openStream(); ByteArrayOutputStream bos = new ByteArrayOutputStream(); byte[] buf = new byte[1024]; try { for (int readNum; (readNum = fis.read(buf)) != -1;) { bos.write(buf, 0, readNum); //no doubt here is 0 //Writes len bytes from the specified byte array starting at offset off to this byte array output stream. System.out.println("read " + readNum + " bytes,"); } } catch (IOException ex) { Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex); } byte[] bytes = bos.toByteArray(); //below is the different part File someFile = new File("java2.pdf"); FileOutputStream fos = new FileOutputStream(someFile); fos.write(bytes); fos.flush(); fos.close(); }

这对我有用：

 try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){ byte[] buffer = new byte[1024]; int bytesRead; while((bytesRead = pdfin.read(buffer))!=-1){ pdfout.write(buffer,0,bytesRead); } }

但如果以下列方式使用，Jon的回答对我不起作用：

 try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){ int k = readFully(pdfin).length; System.out.println(k); }

输出为零长度。这是为什么？

这些都不适合我们，可能是因为我们的输入inputstream是来自rest调用的byte ，而不是来自本地托管的pdf文件。 RestAssured是使用RestAssured将PDF作为输入流读取，然后使用Tika pdf reader解析它，然后调用toString()方法。

 import com.jayway.restassured.RestAssured; import com.jayway.restassured.response.Response; import com.jayway.restassured.response.ResponseBody; import org.apache.tika.exception.TikaException; import org.apache.tika.metadata.Metadata; import org.apache.tika.parser.AutoDetectParser; import org.apache.tika.parser.ParseContext; import org.apache.tika.sax.BodyContentHandler; import org.apache.tika.parser.Parser; import org.xml.sax.ContentHandler; import org.xml.sax.SAXException; InputStream stream = response.asInputStream(); Parser parser = new AutoDetectParser(); // Should auto-detect! ContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); try { parser.parse(stream, handler, metadata, context); } finally { stream.close(); } for (int i = 0; i < metadata.names().length; i++) { String item = metadata.names()[i]; System.out.println(item + " -- " + metadata.get(item)); } System.out.println("!!Printing pdf content: \n" +handler.toString()); System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));

要将pdf转换为byteArray ：

 public byte[] pdfToByte(String filePath)throws JRException { File file = new File(); FileInputStream fileInputStream; byte[] data = null; byte[] finalData = null; ByteArrayOutputStream byteArrayOutputStream = null; try { fileInputStream = new FileInputStream(file); data = new byte[(int)file.length()]; finalData = new byte[(int)file.length()]; byteArrayOutputStream = new ByteArrayOutputStream(); fileInputStream.read(data); byteArrayOutputStream.write(data); finalData = byteArrayOutputStream.toByteArray(); fileInputStream.close(); } catch (FileNotFoundException e) { LOGGER.info("File not found" + e); } catch (IOException e) { LOGGER.info("IO exception" + e); } return finalData; }

我也在我的应用程序中实现了类似的行为。下面是我的代码版本，它function齐全。

  byte[] getFileInBytes(String filename) { File file = new File(filename); int length = (int)file.length(); byte[] bytes = new byte[length]; try { BufferedInputStream reader = new BufferedInputStream(new FileInputStream(file)); reader.read(bytes, 0, length); System.out.println(reader); // setFile(bytes); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } return bytes; }

PDF可能包含二进制数据，当你执行ToString时，它可能会被破坏。在我看来，你想要这个：

  FileInputStream inputStream = new FileInputStream(sourcePath); int numberBytes = inputStream .available(); byte bytearray[] = new byte[numberBytes]; inputStream .read(bytearray);

PDF到字节数组，反之亦然

在通过AssetManager加载时，无法使用libgdx在IOS中加载TextureAtlas文件

具有键/值对的java集合，并根据插入顺序排序

显然Spring Boot竞争条件导致重复的springSecurityFilterChain注册

嵌入式ActiveMQ的启动错误：临时存储限制为51200 mb

表格打印不适合页面大小

Java webservice（soap）客户端 – 使用证书

如何将IntelliJ IDEA随机开放端口绑定到localhost？

当在JVM5和JVM6中运行相同的程序时，HashMap中的项目顺序会有所不同

数组中的随机整数

spring data jpa limit pagesize，如何设置为maxSize