如何用PDFBox替換PDF中的居中文本

我使用PDFTextReplacement示例。它按預期進行替換，以防我的文本左對齊。但是，如果我的輸入pdf具有文本居中，它會將文本替換爲左對齊。好的，所以我必須重新計算正確的起點。如何用PDFBox替換PDF中的居中文本

出於這個原因，我有兩個目標或問題：

如何確定比？
如何計算正確的起點？

這裏是我的代碼：

public PDDocument doIt(String inputFile, Map<String, String> text) 
     throws IOException, COSVisitorException { 
    // the document 
    PDDocument doc = null; 

    doc = PDDocument.load(inputFile); 
    List pages = doc.getDocumentCatalog().getAllPages(); 
    for (int i = 0; i < pages.size(); i++) { 
     PDPage page = (PDPage) pages.get(i); 
     PDStream contents = page.getContents(); 

     PDFStreamParser parser = new PDFStreamParser(contents.getStream()); 
     parser.parse(); 
     List tokens = parser.getTokens(); 
     for (int j = 0; j < tokens.size(); j++) { 
      Object next = tokens.get(j); 

      if (next instanceof PDFOperator) { 

       PDFOperator op = (PDFOperator) next; 

       // Tj and TJ are the two operators that display 
       // strings in a PDF 

       String pstring = ""; 
       int prej = 0; 
       if (op.getOperation().equals("Tj")) { 
        // Tj takes one operator and that is the string 
        // to display so lets update that operator 
        COSString previous = (COSString) tokens.get(j - 1); 
        String string = previous.getString(); 
        // System.out.println(j + " " + string); 
        if (j == prej) { 
         pstring += string; 
        } else { 
         prej = j; 
         pstring = string; 
        } 

        previous.reset(); 
        previous.append(string.getBytes("ISO-8859-1")); 
       } else if (op.getOperation().equals("TJ")) { 
        COSArray previous = (COSArray) tokens.get(j - 1); 
        for (int k = 0; k < previous.size(); k++) { 
         Object arrElement = previous.getObject(k); 
         if (arrElement instanceof COSString) { 
          COSString cosString = (COSString) arrElement; 
          String string = cosString.getString(); 

          if (j == prej) { 
           pstring += string; 
          } else { 
           prej = j; 
           pstring = string; 
          } 

          cosString.reset(); 
          // cosString.append(string 
          // .getBytes("ISO-8859-1")); 
         } 

        } 

        COSString cosString2 = (COSString) previous 
          .getObject(0); 

        for (int t = 1; t < previous.size(); t++) 
         previous.remove(t); 

        // cosString2.setNeedToBeUpdate(true); 

        if (text.containsKey(pstring.trim())) { 

         String textValue = text.get(pstring.trim()); 
         cosString2.append(textValue.getBytes("ISO-8859-1")); 

         for (int k = 1; k < previous.size(); k++) { 
          previous.remove(k); 

         } 
        } 

       } 
      } 
     } 
     // now that the tokens are updated we will replace the 
     // page content stream. 
     PDStream updatedStream = new PDStream(doc); 
     OutputStream out = updatedStream.createOutputStream(); 
     ContentStreamWriter tokenWriter = new ContentStreamWriter(out); 
     tokenWriter.writeTokens(tokens); 
     page.setContents(updatedStream); 
    } 
    return doc; 
}

來源

2013-10-11 markus0074

*如何確定比* - PDF不知道對齊。它從當前原點開始繪製文本，就這些了。您可以嘗試通過比較當前「行」的文本位置與頁面尺寸以及文本在「行」之前和之後的位置（「行」，因爲PDF不一定跟在文本行之後來確定對齊方式概念）。但是如果一些文字看起來集中在一起，你確定它的目的是爲了中心？它也許只是縮進了一段距離，偶然現在*看起來居中*。 – mkl

@mkl是的，這正是我在PDDocument中看到的。所以我必須改進我的問題。 1.那麼如何獲得內容使用的確切空間（icepdf使用lineText.getBounds（））？ 2.如何爲新字符串創建已用空間（基於BASE14字體）） – markus0074

您的代碼工作在非常低的級別，它會檢查來自頁面內容流的單個指令。因此，它不受益於更高級別的功能。這尤其意味着在該級別上，您必須跟蹤當前圖形狀態的變化。爲了做到這一點，您應該首先研究[PDF規範ISO 32000-1]（http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf）特別是第8章（瞭解圖形狀態如何變化）和第9章（瞭解如何繪製文本）。 – mkl

您可以使用此功能：

public void doIt(String inputFile, String outputFile, String strToFind, String message) 
      throws IOException, COSVisitorException 
     { 
      // the document 
      PDDocument doc = null; 
      try 
      { 
       doc = PDDocument.load(inputFile); 
       List pages = doc.getDocumentCatalog().getAllPages(); 
       for(int i=0; i<pages.size(); i++) 
       { 
        PDPage page = (PDPage)pages.get(i); 
        PDStream contents = page.getContents(); 
        PDFStreamParser parser = new PDFStreamParser(contents.getStream()); 
        parser.parse(); 
        List tokens = parser.getTokens(); 
        for(int j=0; j<tokens.size(); j++) 
        { 
         Object next = tokens.get(j); 
         if(next instanceof PDFOperator) 
         { 
          PDFOperator op = (PDFOperator)next; 
          //Tj and TJ are the two operators that display 
          //strings in a PDF 
          if(op.getOperation().equals("Tj")) 
          { 
           //Tj takes one operator and that is the string 
           //to display so lets update that operator 
           COSString previous = (COSString)tokens.get(j-1); 
           String string = previous.getString(); 
           string = string.replaceFirst(strToFind, message); 
           previous.reset(); 
           previous.append(string.getBytes()); 
          } 
          else if(op.getOperation().equals("TJ")) 
          { 
           COSArray previous = (COSArray)tokens.get(j-1); 
           for(int k=0; k<previous.size(); k++) 
           { 
            Object arrElement = previous.getObject(k); 
            if(arrElement instanceof COSString) 
            { 
             COSString cosString = (COSString)arrElement; 
             String string = cosString.getString(); 
             string = string.replaceFirst(strToFind, message); 
             cosString.reset(); 
             cosString.append(string.getBytes()); 
            } 
           } 
          } 
         } 
        } 
        //now that the tokens are updated we will replace the 
        //page content stream. 
        PDStream updatedStream = new PDStream(doc); 
        OutputStream out = updatedStream.createOutputStream(); 
        ContentStreamWriter tokenWriter = new ContentStreamWriter(out); 
        tokenWriter.writeTokens(tokens); 
        page.setContents(updatedStream); 
       } 
       doc.save(outputFile); 
      } 
      finally 
      { 
       if(doc != null) 
       { 
        doc.close(); 
       } 
      } 
     }

來源

2014-02-26 16:52:07 Bourkadi

您的代碼在哪裏試圖確定原始行的對齊方式？更別說重新排列改變的線？ – mkl

請勿使用對齊，只需在您的pdf中製作某種標記，如_TEXTHERE，並用替換函數替換它 – Bourkadi

但是，該操作明確希望保留特殊對齊方式，他希望替換後居中文本。 – mkl

如何用PDFBox替換PDF中的居中文本

回答

相關問題