PDFBox 2.0 RC3 - 查找和替換文本

如何使用PDFBox 2.0查找並替換PDF文檔中的文本，他們拉動了舊的示例，它的語法不再起作用，因此我在想如果它仍然有可能，最好的方法是去做。謝謝！PDFBox 2.0 RC3 - 查找和替換文本

2016-02-15 Shaun

那個老例子實際上只有在非常簡單的PDF工作並沒有改變或者（更糟糕的）損壞更復雜的。 – mkl

你可以嘗試這樣的：

public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException { 
    if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) { 
     return document; 
    } 
    PDPageTree pages = document.getDocumentCatalog().getPages(); 
    for (PDPage page : pages) { 
     PDFStreamParser parser = new PDFStreamParser(page); 
     parser.parse(); 
     List tokens = parser.getTokens(); 
     for (int j = 0; j < tokens.size(); j++) { 
      Object next = tokens.get(j); 
      if (next instanceof Operator) { 
       Operator op = (Operator) next; 
       //Tj and TJ are the two operators that display strings in a PDF 
       if (op.getName().equals("Tj")) { 
        // Tj takes one operator and that is the string to display so lets update that operator 
        COSString previous = (COSString) tokens.get(j - 1); 
        String string = previous.getString(); 
        string = string.replaceFirst(searchString, replacement); 
        previous.setValue(string.getBytes()); 
       } else if (op.getName().equals("TJ")) { 
        COSArray previous = (COSArray) tokens.get(j - 1); 
        for (int k = 0; k < previous.size(); k++) { 
         Object arrElement = previous.getObject(k); 
         if (arrElement instanceof COSString) { 
          COSString cosString = (COSString) arrElement; 
          String string = cosString.getString(); 
          string = StringUtils.replaceOnce(string, searchString, replacement); 
          cosString.setValue(string.getBytes()); 
         } 
        } 
       } 
      } 
     } 
     // now that the tokens are updated we will replace the page content stream. 
     PDStream updatedStream = new PDStream(document); 
     OutputStream out = updatedStream.createOutputStream(); 
     ContentStreamWriter tokenWriter = new ContentStreamWriter(out); 
     tokenWriter.writeTokens(tokens); 
     page.setContents(updatedStream); 
     out.close(); 
    } 
    return document; 
}

來源

2016-04-04 13:42:09 mourphy

此代碼僅適用於非常簡單的PDF文件，不會更改或（更糟糕）損壞更復雜的文件。 – mkl

https://pdfbox.apache.org/2.0/migration.html爲什麼要刪除ReplaceText示例？ –

這在您提到的鏈接的最後一節中有解釋：https://pdfbox.apache.org/2.0/migration.html#why-was-the-replacetext-example-removed 這主要是由於字符編碼和字體問題。 – maxxyme

我花在想出了一個解決方案的時間和最終獲取的Acrobat DC訂閱，這樣我可以爲文本創建字段作爲佔位符是更換。在我的情況下，這些字段是用於客戶信息和訂單詳細信息，因此它不是非常複雜的數據，但該文檔充滿了業務相關條件的頁面，並且佈局非常複雜。

然後我只是做了這個，這可能適合你。

private void update() throws InvalidPasswordException, IOException { 
    Map<String, String> map = new HashMap<>(); 
    map.put("fieldname", "value to update"); 
    File template = new File("template.pdf"); 
    PDDocument document = PDDocument.load(template); 
    List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields(); 
    for (PDField field : fields) { 
     for (Map.Entry<String, String> entry : map.entrySet()) { 
      if (entry.getKey().equals(field.getFullyQualifiedName())) { 
       field.setValue(entry.getValue()); 
       field.setReadOnly(true); 
      } 
     } 
    } 
    File out = new File("out.pdf"); 
    document.save(out); 
    document.close(); 
}

因人而異

來源

2017-11-03 05:17:38

使用AcroForm字段確實是應該如何完成PDF填充。但是你不需要Acrobat來創建字段，你也可以用PDFBox來做到這一點...（雖然沒有好的GUI） – mkl

Thx @mkl，我意識到可以使用pdfbox創建字段，但我可以沒有弄清楚如何將它們放在文檔中的確切位置。 –

PDFBox 2.0 RC3 - 查找和替換文本

回答

相關問題