pdfbox和itext使用不正确的dpi提取图像

当我使用pdfbox提取图像时，我得到的图像的dpi不正确。当我使用Photoshop或Acrobat Reader Pro提取图像时，我可以看到使用Windows照片查看器的图像的dpi为200，但是当我使用pdfbox提取图像时，dpi为72。

为了提取图像我使用以下代码：无法从PDFA1-格式文档中提取图像

当我查看日志时，我看到一个不寻常的条目：2015-01-23-main – DEBUG-org.apache.pdfbox.util.TIFFUtil：

我尝试谷歌，但我可以看到通过此日志找出pdfbox的含义。这是什么意思？

您可以从以下链接下载带有此问题的示例pdf： http ： //myslams.com/test/1.pdf

我甚至尝试过itext，但是用96 dpi提取图像。

难道我做错了什么？或pdfbox和itext有这个限制？

经过一番挖掘，我找到了你的1.pdf。从而，…

PDFBox的

在对最近的答案 @Tilman的评论中，您正在讨论这个较旧的答案，其中@Tilman指向PrintImageLocations PDFBox示例。我为你的文件运行它并得到：

 Processing page: 0 ******************************************************************* Found image [Im0] position = 0.0, 0.0 size = 1704px, 888px size = 613.44, 319.68 size = 8.52in, 4.44in size = 216.408mm, 112.776mm Processing page: 1 ******************************************************************* Found image [Im0] position = 0.0, 0.0 size = 1704px, 2800px size = 613.44, 1008.0 size = 8.52in, 14.0in size = 216.408mm, 355.6mm Processing page: 2 ******************************************************************* Found image [Im0] position = 0.0, 0.0 size = 1704px, 2800px size = 613.44, 1008.0 size = 8.52in, 14.0in size = 216.408mm, 355.6mm Processing page: 3 ******************************************************************* Found image [Im0] position = 0.0, 0.0 size = 1704px, 1464px size = 613.44, 527.04 size = 8.52in, 7.3199997in size = 216.408mm, 185.928mm

在所有页面上，这在x和y方向上均为200dpi（1704px / 8.52in = 888px / 4.44in = 2800px / 14.0in = 1464px / 7.32in = 200dpi）。

因此，PDFBox为您提供了您所追求的dpi值。

（@Tilman：该示例的当前2.0.0-SNAPSHOT版本返回完全废话;您可能想要解决此问题。）

iText的

该PDFBox示例的简化iText版本将是：

 public void printImageLocations(InputStream stream) throws IOException { PdfReader reader = new PdfReader(stream); PdfReaderContentParser parser = new PdfReaderContentParser(reader); ImageRenderListener listener = new ImageRenderListener(); for (int page = 1; page <= reader.getNumberOfPages(); page++) { System.out.printf("\nPage %s:\n", page); parser.processContent(page, listener); } } static class ImageRenderListener implements RenderListener { public void beginTextBlock() { } public void renderText(TextRenderInfo renderInfo) { } public void endTextBlock() { } public void renderImage(ImageRenderInfo renderInfo) { try { PdfDictionary imageDict = renderInfo.getImage().getDictionary(); float widthPx = imageDict.getAsNumber(PdfName.WIDTH).floatValue(); float heightPx = imageDict.getAsNumber(PdfName.HEIGHT).floatValue(); float widthUu = renderInfo.getImageCTM().get(Matrix.I11); float heigthUu = renderInfo.getImageCTM().get(Matrix.I22); System.out.printf("Image %.0fpx*%.0fpx, %.0fuu*%.0fuu, %.2fin*%.2fin\n", widthPx, heightPx, widthUu, heigthUu, widthUu/72, heigthUu/72); } catch (IOException e) { e.printStackTrace(); } } }

（注意：我假设没有旋转和未图像的图像。）

您的文件的结果：

 Page 1: Image 1704px*888px, 613uu*320uu, 8,52in*4,44in Page 2: Image 1704px*2800px, 613uu*1008uu, 8,52in*14,00in Page 3: Image 1704px*2800px, 613uu*1008uu, 8,52in*14,00in Page 4: Image 1704px*1464px, 613uu*527uu, 8,52in*7,32in

因此，一直也是200dpi。所以iText也会为你提供你所追求的dpi值。

你的代码

显然，您引用的代码没有机会报告在PDF上下文中合理的dpi值，因为它只提取资源中找到的图像，但忽略了在页面上使用相应图像资源的方式 。

图像资源可以被拉伸，旋转，倾斜，...当他在页面内容中使用它时，作者喜欢的任何方式。

顺便说一下，如果作者没有倾斜并且只旋转了90°的倍数，则dpi值才有意义。

pdfbox和itext使用不正确的dpi提取图像

PDFBox的

iText的

你的代码

如何使用iText以正确的顺序从PDF中提取图像？

如何正确合并文件？

iText：单元格中的图像列表

使用iText设置标题行的格式

iText – 可点击的图片应该打开ms word附件

使用iText，在内存上生成一个在磁盘上生成的PDF

使用java将多个图像添加到使用iText的单个pdf文件中

合并pdf并在java中添加iText书签

iText如何从可填写的模板创建多页文档

使用evince保存后，可编辑的.pdf字段消失（但在字段焦点上可见）