2011-07-23 153 views
0

有人可以幫助我獲得pdf begintext部分的真實像素座標嗎? 我使用pdfbox從pdf文件檢索文本,但現在我需要得到rects sorounding文本部分/段落。pdfbox獲取開始文本部分(BT ET)座標

$contents = $page->getContents(); 
$contentsStream = $page->getContents()->getStream(); 
$resources=$page->getResources(); 
$fonts = $resources->getFonts(); 
$xobjects = $resources->getImages(); 
$tokens=$contentsStream->getStreamTokens(); 
  • [PDFOperator {Q},COSFloat {690.48},COSInt {0},COSInt {0},COSFloat {633.6},COSInt {0},COSInt {0},PDFOperator {釐米} ,COSName {IM1},PDFOperator {DO},PDFOperator {Q},

  • PDFOperator {BT},COSInt {1},COSInt {0},COSInt {0},COSInt {1},COSFloat {25.92 },COSFloat {588.48},PDFOperator {Tm},COSInt {99},PDFOperator {Tz},COSName {F30},COSInt {56},PDFOperator {Tf},COSInt {3},PDFOperator {Tr},COSFloat {0.334 },PDFOperator {Tc},COSString {Pospremanj},PDFOperator {Tj},COSInt {0},PDFOperator {Tc},COSString {e},PDFOperator {Tj},COSFloat {9.533 },PDFOperator {Tw},COSString {i},PDFOperator {Tj},COSFloat {6.062},PDFOperator {Tw},COSFloat {0.95},PDFOperator {Tc},COSString {ciscenj},PDFOperator {Tj},COSInt {0 },PDFOperator {Tc},COSString {e},PDFOperator {Tj},COSInt {1},COSInt {0},COSInt {0},COSInt {1},COSFloat {55.68},COSFloat {539.76},PDFOperator {Tm },COSInt {0},PDFOperator {Tw},COSFloat {0.262},PDFOperator {Tc},COSString {uoè},PDFOperator {Tj},COSInt {0},PDFOperator {Tc},COSString {i},PDFOperator {Tj } COSFloat {5.443},PDFOperator {Tw},COSFloat {-2.145},PDFOperator {Tc},COSString {zimslco},PDFOperator {Tj},COSInt {0},PDFOperator {Tc},COSString {g} Tj},COSFloat {7.202},PDFOperator {Tw},COSFloat {-0.148},PDFOperator {Tc},COSString {odmor},PDFOperator {Tj},COSInt {0},PDFOperator {Tc},COSString {a} {Tj},PDFOperator {ET},

  • PDF操作員{BT} COSInt {0},COSInt {0},COSInt {1},COSFloat {6.72},COSFloat {513.12},PDFOperator {Tm} ,COSInt {14},{PDFOperator TF},COSString {},{PDFOperator TJ},{COSFloat 2.751},{PDFOperator}繁體, ...

我想獲得輸出像PrintTextLocations函數適用於每個單詞/字符。 我可以得到底部和左側座標,但如何獲得寬度和頂部座標?

PrintTextLocations:

  • 串[25.92,45.119995 FS = 56.0的XScale = 55.440002高度= 40.208004空間= 15.412322寬度= 36.978485] p 串[63.22914,45.119995 FS = 56.0的XScale = 55.440002高度=字符串[97.43364,45.119995 fs = 56.0 xscale = 55.440002 height = 40.208004 space = 15.412322 width = 30.824646] s string [128.58894,45.119995 fs = 56.0 xscale = 55.440002 height = 42.168 space = 15.412322 width = 33.87384] p string [162.79344,45.119995 fs = 56.0 xscale = 55.440002 height = 42.168 space = 15.412322 width = 21.566162] r string [184.69026,45.119995 fs = 56.0 xscale = 55.440002 height = 42.168 space = 15.412322 width = 30.824646] e string [215.84557,45.119995 fs = 56.0 xscale = 55.440002 height = 42.168 space = 15.412322 width = 49.286148]米 ...

回答

0

...爲BT節爲您提供了左下角的座標,您需要解析低谷包含在當前BT塊來獲得所有其他座標的所有字/字母。 第一個單詞高度+ BT底部=頂部,最大(左側座標+寬度)=右側,最後一個單詞底部=底部座標。

我希望這可以幫助別人......

-Matijevi Kancijan