如何將pdf中的圖像座標轉換爲JSON文件？

我編碼創建HTML頁面包括圖像提取PDF文檔中的頁面。如何將pdf中的圖像座標轉換爲JSON文件？

我試圖從PDF中提取圖像，然後我成功地從PDF中提取圖像，並使用PDFBox lib將圖像應用於html頁面。但我沒有在HTML頁面中提取圖像座標。

因此，搜索如何提取pdf中的圖像座標，我嘗試使用PDFBox庫提取pdf中的圖像座標。

下面的代碼：

public static void main(String[] args) throws Exception 
{ 
    try 
    { 
     PDDocument document = PDDocument.load(
      "/Users/tmdtjq/Downloads/PDFTest/test.pdf"); 

     PrintImageLocations printer = new PrintImageLocations(); 
     List allPages = document.getDocumentCatalog().getAllPages(); 
     for(int i=0; i<allPages.size(); i++) 
     { 
      PDPage page = (PDPage)allPages.get(i); 
      int pageNum = i+1; 
      System.out.println("Processing page: " + pageNum); 
      printer.processStream(page, page.findResources(), 
       page.getContents().getStream()); 
     } 
    } 
    finally 
    { 
    } 
} 

protected void processOperator(PDFOperator operator, List arguments) throws IOException 
{ 
    String operation = operator.getOperation(); 
    if(operation.equals("Do")) 
    { 
     COSName objectName = (COSName)arguments.get(0); 
     Map xobjects = getResources().getXObjects(); 
     PDXObject xobject = xobjects.get(objectName.getName()); 
     if(xobject instanceof PDXObjectImage) 
     { 
      try 
      { 
       PDXObjectImage image = (PDXObjectImage)xobject; 
       PDPage page = getCurrentPage(); 
       Matrix ctm = getGraphicsState().getCurrentTransformationMatrix(); 
       double rotationInRadians =(page.findRotation() * Math.PI)/180; 

       AffineTransform rotation = new AffineTransform(); 
       rotation.setToRotation(rotationInRadians); 
       AffineTransform rotationInverse = rotation.createInverse(); 
       Matrix rotationInverseMatrix = new Matrix(); 
       rotationInverseMatrix.setFromAffineTransform(rotationInverse); 
       Matrix rotationMatrix = new Matrix(); 
       rotationMatrix.setFromAffineTransform(rotation); 

       Matrix unrotatedCTM = ctm.multiply(rotationInverseMatrix); 
       float xScale = unrotatedCTM.getXScale(); 
       float yScale = unrotatedCTM.getYScale(); 
       float xPosition = unrotatedCTM.getXPosition(); 
       float yPosition = unrotatedCTM.getYPosition(); 

       System.out.println("Found image[" + objectName.getName() + "] " + 
        "at " + xPosition + "," + yPosition + 
        " size=" + (xScale/100f*image.getWidth()) + "," + (yScale/100f*image.getHeight())); 
      } 
      catch(NoninvertibleTransformException e) 
      { 
       throw new WrappedIOException(e); 
      } 
     } 
    } 
}

輸出打印X，在圖像y位置是所有0.0，0.0。

我想因爲getGraphicsState（）是返回graphicsState的方法。

但我想獲得具體的圖像座標應用於PDF頁面的高度，寬度以創建html頁面。

我想也許這是從PDF圖像座標中提取JSON的解決方案。

請將PDF圖像座標引入JSON工具或建議PDF庫。

（我已經在FlexPaper使用pdf2json工具。這個工具提取JSONfile從PDF頁面，包括沒有圖像數據只是文本數據（內容，座標，字體..）。）

來源

2014-08-28 TPS1

如果所有職位都報告爲（0,0），那是因爲* origin *已被轉換。試試[getCurrentTransformationmatrix（）]（https://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/graphics/PDGraphicsState.html#getCurrentTransformationMatrix%28%29）。 – usr2564301 2014-08-28 10:10:20

我能找到與圖像搜索cm運營商。我重寫了PDFTextStripper以下方式：注意：它沒有考慮到旋轉和鏡像！

public static class TextFinder extends PDFTextStripper { 

    public TextFinder() throws IOException { 
     super(); 
    } 

    @Override 
    protected void startPage(PDPage page) throws IOException { 
     // process start of the page 
     super.startPage(page); 
    } 

    @Override 
    public void process(PDFOperator operator, List<COSBase> arguments) 
      throws IOException { 

     if ("cm".equals(operator.getOperation())) { 
      float width = ((COSNumber)arguments.get(0)).floatValue(); 
      float height = ((COSNumber)arguments.get(3)).floatValue(); 
      float x = ((COSNumber)arguments.get(4)).floatValue(); 
      float y = ((COSNumber)arguments.get(5)).floatValue(); 
      // process image coordinates 
     } 
     super.processOperator(operator, arguments); 
    } 

    @Override 
    protected void writeString(String text, 
      List<TextPosition> textPositions) throws IOException { 
     for (TextPosition position : textPositions) { 
      // process text coordinates 
     } 
     super.writeString(text, textPositions); 
    } 
}

當然，人們可以使用PDFStreamEngine，而不是PDFTextStripper，如果一個人沒有興趣與圖像一起查找文本。

來源

2015-01-10 13:58:29 divanov

如何將pdf中的圖像座標轉換爲JSON文件？

回答

相關問題