PDFBox：如何“压扁”PDF格式？

如何使用PDFBox“展平”PDF格式（删除表单字段但保留字段文本）？

这里回答了同样的问题：

快速执行此操作的方法是从acrofrom中删除字段。

为此，您只需要获取文档目录，然后获取acroform，然后从此acroform中删除所有字段。

图形表示与注释链接并保留在文档中。

所以我写了这段代码：

import java.io.File; import java.util.ArrayList; import java.util.List; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.pdmodel.PDDocumentCatalog; import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm; import org.apache.pdfbox.pdmodel.interactive.form.PDField; public class PdfBoxTest { public void test() throws Exception { PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf")); PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = pdCatalog.getAcroForm(); if (acroForm == null) { System.out.println("No form-field --> stop"); return; } @SuppressWarnings("unchecked") List fields = acroForm.getFields(); // set the text in the form-field <-- does work for (PDField field : fields) { if (field.getFullyQualifiedName().equals("formfield1")) { field.setValue("Test-String"); } } // remove form-field but keep text ??? // acroForm.getFields().clear(); <-- does not work // acroForm.setFields(null); <-- does not work // acroForm.setFields(new ArrayList()); <-- does not work // ??? pdDoc.save("E:\\Form-Test-Result.pdf"); pdDoc.close(); } }

使用PDFBox 2，现在可以使用这种新的API方法轻松地“压平”PDF格式： PDAcroForm.flatten（）。

使用此方法调用示例的简化代码：

 //Load the document PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf")); PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm(); //Fill the document ... //Flatten the document pDAcroForm.flatten(); //Save the document pDDocument.save("E:\\Form-Test-Result.pdf"); pDDocument.close();

注意：动态XFA表单不能展平。

要从PDFBox 1. *迁移到2.0，请查看官方迁移指南。

这肯定是有效的 – 我遇到了这个问题，整夜调试，但终于弄明白了怎么做:)

这假设您有能力以某种方式编辑PDF /对PDF有一定的控制权。

首先，使用Acrobat Pro编辑表单。将它们隐藏为只读。

然后你需要使用两个库：PDFBox和PDFClown。

PDFBox删除了告诉Adobe Reader它是一个表单的东西; PDFClown删除实际字段。 PDFClown必须先完成，然后是PDFBox（按顺序完成。反过来说不起作用）。

单字段示例代码：

 // PDF Clown code File file = new File("Some file path"); Document document = file.getDocument(); Form form = file.getDocument.getForm(); Fields fields = form.getFields(); Field field = fields.get("some_field_name"); PageStamper stamper = new PageStamper(); FieldWidgets widgets = field.getWidgets(); Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out stamper.setPage(widget.getPage()); // Write text using text form field position as pivot. PrimitiveComposer composer = stamper.getForeground(); Font font = font.get(document, "some_path"); composer.setFont(font, 10); double xCoordinate = widget.getBox().getX(); double yCoordinate = widget.getBox().getY(); composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate)); // Actually delete the form field! field.delete(); stamper.flush(); // Create new buffer to output to... Buffer buffer = new Buffer(); file.save(buffer, SerializationModeEnum.Standard); byte[] bytes = buffer.toByteArray(); // PDFBox code InputStream pdfInput = new ByteArrayInputStream(bytes); PDDocument pdfDocument = PDDocument.load(pdfInput); // Tell Adobe we don't have forms anymore. PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog(); PDAcroForm acroForm = pdCatalog.getAcroForm(); COSDictionary acroFormDict = acroForm.getDictionary(); COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields"); cosFields.clear(); // Phew. Finally. pdfDocument.save("Some file path");

可能在这里和那里有一些错别字，但这应该足以得到要点:)

setReadOnly为我工作，如下所示 –

  @SuppressWarnings("unchecked") List fields = acroForm.getFields(); for (PDField field : fields) { if (field.getFullyQualifiedName().equals("formfield1")) { field.setReadOnly(true); } }

在阅读了pdf参考指南之后，我发现通过添加值为1的“Ff”键（字段标志），您可以非常轻松地为AcroForm字段设置只读模式。这就是文档所代表的内容：

如果设置，则用户可能不会更改字段的值。任何关联的窗口小部件注释都不会与用户交互; 也就是说，他们不会响应鼠标点击或改变他们的外观以响应鼠标动作。此标志对于从数据库计算或导入其值的字段非常有用。

所以代码看起来像那样（使用pdfbox lib）：

  public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException { PDDocumentCatalog catalog = pdDoc.getDocumentCatalog(); PDAcroForm form = catalog.getAcroForm(); List acroFormFields = form.getFields(); System.out.println(String.format("found %d acroFrom fields", acroFormFields.size())); for(PDField field: acroFormFields) { makeAcroFieldReadOnly(field); } } private static void makeAcroFieldReadOnly(PDField field) { field.getDictionary().setInt("Ff",1); }

使用pdfBox展平acroform并保留表单字段值的解决方案：

请参阅https://mail-archives.apache.org/mod_mbox/pdfbox-users/201604.mbox/%3C3BC7E352-9447-4458-AAC3-5A9B70B4CCAA@fileaffairs.de%3E上的解决方案

适用于pdfbox 2.0.1的解决方案：

 File myFile = new File("myFile.pdf"); PDDocument pdDoc = PDDocument.load(myFile); PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog(); PDAcroForm pdAcroForm = pdCatalog.getAcroForm(); // set the NeedAppearances flag to false pdAcroForm.setNeedAppearances(false); field.setValue("new-value"); pdAcroForm.flatten(); pdDoc.save("myFlattenedFile.pdf"); pdDoc.close();

我不需要在上面的解决方案链接中执行以下2个步骤：

 // correct the missing page link for the annotations // Add the missing resources to the form

我在OpenOffice 4.1.1中创建了我的pdf表单并导出为pdf。在OpenOffice导出对话框中选择的2个项目是：

选择“创建Pdf表单”
提交格式为“PDF” – 我发现这比给“FDF”提供的文件大小更小，但仍然以PDF格式运行。

使用PdfBox我填充了表单字段并创建了一个展平的pdf文件，该文件删除了表单字段但保留了表单字段值。

为了真正“扁平化”杂技演员forms领域，似乎还有许多事情要做，而不是乍一看。在检查PDF标准后，我设法通过三个步骤实现真正的计划：

保存字段值
删除小部件
删除表单字段

所有这三个步骤都可以使用pdfbox完成（我使用的是1.8.5）。下面我将描绘我是如何做到的。一个非常有用的工具，以了解最新情况是PDF调试器。

保存字段

这是三者中最复杂的一步。

为了保存字段的值，您必须将其内容保存为每个字段小部件的pdf内容。最简单的方法是将每个小部件的外观绘制到小部件的页面。

 void saveFieldValue( PDField field ) throws IOException { PDDocument document = getDocument( field ); // see PDField.getWidget() for( PDAnnotationWidget widget : getWidgets( field ) ) { PDPage parentPage = getPage( widget ); try (PDPageContentStream contentStream = new PDPageContentStream( document, parentPage, true, true )) { writeContent( contentStream, widget ); } } } void writeContent( PDPageContentStream contentStream, PDAnnotationWidget widget ) throws IOException { PDAppearanceStream appearanceStream = getAppearanceStream( widget ); PDXObject xobject = new PDXObjectForm( appearanceStream.getStream() ); AffineTransform transformation = getPositioningTransformation( widget.getRectangle() ); contentStream.drawXObject( xobject, transformation ); }

外观是一个XObject流，包含所有窗口小部件的内容（值，字体，大小，旋转等）。您只需将其放在页面上的正确位置即可从窗口小部件的矩形中提取。

删除小部件

如上所述，每个字段可以具有多个小部件。小部件负责如何编辑表单字段，触发，在不编辑时显示这些内容。

要删除它，您必须从页面的注释中删除它。

 void removeWidget( PDAnnotationWidget widget ) throws IOException { PDPage widgetPage = getPage( widget ); List annotations = widgetPage.getAnnotations(); PDAnnotation deleteCandidate = getMatchingCOSObjectable( annotations, widget ); if( deleteCandidate != null && annotations.remove( deleteCandidate ) ) widgetPage.setAnnotations( annotations ); }

请注意，注释可能不包含确切的PDAnnotationWidget，因为它是一种包装器。您必须删除具有匹配COSObject的那个。

删除表单字段

最后一步是删除表单字段本身。这与上面的其他post没什么不同。

 void removeFormfield( PDField field ) throws IOException { PDAcroForm acroForm = field.getAcroForm(); List acroFields = acroForm.getFields(); List removeCandidates = getFields( acroFields, field.getPartialName() ); if( removeAll( acroFields, removeCandidates ) ) acroForm.setFields( acroFields ); }

请注意，我在这里使用了自定义removeAll方法，因为removeCandidates.removeAll（）没有按预期工作。

很抱歉，我无法在此提供所有代码，但有了上述内容，您应该可以自己编写代码。

这是我在综合了我能找到的关于这个主题的所有答案后得出的代码。这会处理展平文本框，组合，列表，复选框和无线电：

 public static void flattenPDF (PDDocument doc) throws IOException { // // find the fields and their kids (widgets) on the input document // (each child widget represents an appearance of the field data on the page, there may be multiple appearances) // PDDocumentCatalog catalog = doc.getDocumentCatalog(); PDAcroForm form = catalog.getAcroForm(); List tmpfields = form.getFields(); PDResources formresources = form.getDefaultResources(); Map formfonts = formresources.getFonts(); PDAnnotation ann; // // for each input document page convert the field annotations on the page into // content stream // List pages = catalog.getAllPages(); Iterator pageiterator = pages.iterator(); while (pageiterator.hasNext()) { // // get next page from input document // PDPage page = pageiterator.next(); // // add the fonts from the input form to this pages resources // so the field values will display in the proper font // PDResources pageResources = page.getResources(); Map pageFonts = pageResources.getFonts(); pageFonts.putAll(formfonts); pageResources.setFonts(pageFonts); // // Create a content stream for the page for appending // PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true); // // Find the appearance widgets for all fields on the input page and insert them into content stream of the page // for (PDField tmpfield : tmpfields) { List widgets = tmpfield.getKids(); if(widgets == null) { widgets = new ArrayList(); widgets.add(tmpfield.getWidget()); } Iterator widgetiterator = widgets.iterator(); while (widgetiterator.hasNext()) { COSObjectable next = widgetiterator.next(); if (next instanceof PDField) { PDField foundfield = (PDField) next; ann = foundfield.getWidget(); } else { ann = (PDAnnotation) next; } if (ann.getPage().equals(page)) { COSDictionary dict = ann.getDictionary(); if (dict != null) { if(tmpfield instanceof PDVariableText || tmpfield instanceof PDPushButton) { COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP"); if (ap != null) { contentStream.appendRawCommands("q\n"); COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect"); if (rectarray != null) { float[] rect = rectarray.toFloatArray(); String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n"; contentStream.appendRawCommands(s); } COSStream stream = (COSStream) ap.getDictionaryObject("N"); if (stream != null) { InputStream ioStream = stream.getUnfilteredStream(); ByteArrayOutputStream byteArray = new ByteArrayOutputStream(); byte[] buffer = new byte[4096]; int amountRead = 0; while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) { byteArray.write(buffer, 0, amountRead); } contentStream.appendRawCommands(byteArray.toString() + "\n"); } contentStream.appendRawCommands("Q\n"); } } else if (tmpfield instanceof PDChoiceButton) { COSDictionary ap = (COSDictionary) dict.getDictionaryObject("AP"); if(ap != null) { contentStream.appendRawCommands("q\n"); COSArray rectarray = (COSArray) dict.getDictionaryObject("Rect"); if (rectarray != null) { float[] rect = rectarray.toFloatArray(); String s = " 1 0 0 1 " + Float.toString(rect[0]) + " " + Float.toString(rect[1]) + " cm\n"; contentStream.appendRawCommands(s); } COSName cbValue = (COSName) dict.getDictionaryObject(COSName.AS); COSDictionary d = (COSDictionary) ap.getDictionaryObject(COSName.D); if (d != null) { COSStream stream = (COSStream) d.getDictionaryObject(cbValue); if(stream != null) { InputStream ioStream = stream.getUnfilteredStream(); ByteArrayOutputStream byteArray = new ByteArrayOutputStream(); byte[] buffer = new byte[4096]; int amountRead = 0; while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) { byteArray.write(buffer, 0, amountRead); } if (!(tmpfield instanceof PDCheckbox)){ contentStream.appendRawCommands(byteArray.toString() + "\n"); } } } COSDictionary n = (COSDictionary) ap.getDictionaryObject(COSName.N); if (n != null) { COSStream stream = (COSStream) n.getDictionaryObject(cbValue); if(stream != null) { InputStream ioStream = stream.getUnfilteredStream(); ByteArrayOutputStream byteArray = new ByteArrayOutputStream(); byte[] buffer = new byte[4096]; int amountRead = 0; while ((amountRead = ioStream.read(buffer, 0, buffer.length)) != -1) { byteArray.write(buffer, 0, amountRead); } contentStream.appendRawCommands(byteArray.toString() + "\n"); } } contentStream.appendRawCommands("Q\n"); } } } } } } // delete any field widget annotations and write it all to the page // leave other annotations on the page COSArrayList newanns = new COSArrayList(); List anns = page.getAnnotations(); ListIterator annotiterator = anns.listIterator(); while (annotiterator.hasNext()) { COSObjectable next = (COSObjectable) annotiterator.next(); if (!(next instanceof PDAnnotationWidget)) { newanns.add(next); } } page.setAnnotations(newanns); contentStream.close(); } // // Delete all fields from the form and their widgets (kids) // for (PDField tmpfield : tmpfields) { List kids = tmpfield.getKids(); if(kids != null) kids.clear(); } tmpfields.clear(); // Tell Adobe we don't have forms anymore. PDDocumentCatalog pdCatalog = doc.getDocumentCatalog(); PDAcroForm acroForm = pdCatalog.getAcroForm(); COSDictionary acroFormDict = acroForm.getDictionary(); COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields"); cosFields.clear(); }

完整课程： https ： //gist.github.com/jribble/beddf7620536939f88db

这是Thomas的答案，来自PDFBox-Mailinglist：

你需要在COSDictionary上获得Fields。试试这个代码……

 PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf")); PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog(); PDAcroForm acroForm = pdCatalog.getAcroForm(); COSDictionary acroFormDict = acroForm.getDictionary(); COSArray fields = acroFormDict.getDictionaryObject("Fields"); fields.clear();

我没有足够的评论来评论，但是SJohnson对将该字段设置为只读的反应对我来说非常有效。我在PDFBox中使用这样的东西：

 private void setFieldValueAndFlatten(PDAcroForm form, String fieldName, String fieldValue) throws IOException { PDField field = form.getField(fieldName); if(field != null){ field.setValue(fieldValue); field.setReadonly(true); } }

这将写入您的字段值，然后在保存后打开PDF时，它将具有您的值，而不是可编辑的。

我以为我会分享我们使用PDFBox 2+的方法。

我们使用了PDAcroForm.flatten()方法。

这些字段需要一些预处理，最重要的是，必须遍历嵌套的字段结构，并检查DV和V的值。

最后有效的是：

 private static void flattenPDF(String src, String dst) throws IOException { PDDocument doc = PDDocument.load(new File(src)); PDDocumentCatalog catalog = doc.getDocumentCatalog(); PDAcroForm acroForm = catalog.getAcroForm(); PDResources resources = new PDResources(); acroForm.setDefaultResources(resources); List fields = new ArrayList<>(acroForm.getFields()); processFields(fields, resources); acroForm.flatten(); doc.save(dst); doc.close(); } private static void processFields(List fields, PDResources resources) { fields.stream().forEach(f -> { f.setReadOnly(true); COSDictionary cosObject = f.getCOSObject(); String value = cosObject.getString(COSName.DV) == null ? cosObject.getString(COSName.V) : cosObject.getString(COSName.DV); System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value); try { f.setValue(value); } catch (IOException e) { if (e.getMessage().matches("Could not find font: /.*")) { String fontName = e.getMessage().replaceAll("^[^/]*/", ""); System.out.println("Adding fallback font for: " + fontName); resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA); try { f.setValue(value); } catch (IOException e1) { e1.printStackTrace(); } } else { e.printStackTrace(); } } if (f instanceof PDNonTerminalField) { processFields(((PDNonTerminalField) f).getChildren(), resources); } }); }

PDFBox：如何“压扁”PDF格式？

保存字段

删除小部件

删除表单字段

Matcher.appendReplacement与文字文本

创建适合Python和Java的配置文件的“标准”方法

在Lucene中获取学期频率

JAXB编组具有相同名称的元素的变量列表

在Java中使用带标签的语句有什么意义？

java jstack工具内存不足或附加权限不足

有相当于，在C ++中？

如何在运行时监视某些java方法分配的内存

如何检查Java中当前运行的线程数？

演示如何使用RSA公钥系统来交换实现机密性和完整性/身份validation的消息