從昨天開始,我試圖使用python-poppler-qt4從一些pdf中的突出顯示的註釋中提取文本。使用poppler-qt4/python-poppler-qt4提取來自明亮文本的文本
根據this documentation,看起來像我必須使用Page.text()方法獲取文本,並從使用Annotation.boundary()的高亮註釋中傳遞Rectangle參數。但我只獲得空白文本。有人能幫我嗎?我把我的代碼belloew和我正在使用的pdf的鏈接。謝謝您的幫助!
import popplerqt4
import sys
import PyQt4
def main():
doc = popplerqt4.Poppler.Document.load(sys.argv[1])
total_annotations = 0
for i in range(doc.numPages()):
page = doc.page(i)
annotations = page.annotations()
if len(annotations) > 0:
for annotation in annotations:
if isinstance(annotation, popplerqt4.Poppler.Annotation):
total_annotations += 1
if(isinstance(annotation, popplerqt4.Poppler.HighlightAnnotation)):
print str(page.text(annotation.boundary()))
if total_annotations > 0:
print str(total_annotations) + " annotation(s) found"
else:
print "no annotations found"
if __name__ == "__main__":
main()
測試PDF: https://www.dropbox.com/s/10plnj67k9xd1ot/test.pdf
謝謝,我努力安裝popplerqt4,但這工作就像一個魅力! – magicrebirth