python-pptx從幻燈片標題中提取文本

我在python中構建了一個文檔檢索引擎，它返回按用戶提交的查詢的相關性排列的文檔。我有一個包含PowerPoint文件的文檔集合。對於PPT，在結果頁面上，我想向用戶展示前幾個幻燈片標題，以給他/她更清晰的圖片（有點像我們在Google搜索中看到的）。python-pptx從幻燈片標題中提取文本

所以基本上，我想從使用python的PPT文件的幻燈片標題中提取文本。我正在使用python-pptx包。目前我的實現看起來是這樣的

from pptx import Presentation 
prs = Presentation(filepath) # load the ppt 
slide_titles = [] # container foe slide titles 
for slide in prs.slides: # iterate over each slide 
     title_shape = slide.shapes[0] # consider the zeroth indexed shape as the title 
     if title_shape.has_text_frame: # is this shape has textframe attribute true then 
      # check if the slide title already exists in the slide_title container 
      if title_shape.text.strip(""" [email protected]#$%^&*)(_-+=}{][:;<,>.?"'/<,""")+ '. ' not in slide_titles: 
       slide_titles.append(title_shape.text.strip(""" [email protected]#$%^&*)(_-+=}{][:;<,>.?"'/<,""")+ '. ')

但你可以看到我假設每張幻燈片上零索引的形狀是幻燈片標題，這顯然不是這種情況每次。任何想法如何實現這一目標？

在此先感謝。

來源

2017-04-12 Clock Slave

Slide.shapes（a SlideShapes對象）具有屬性.title，它返回標題形狀，當有一個（通常是）時返回標題形狀，如果沒有標題存在則返回None。
http://python-pptx.readthedocs.io/en/latest/api/shapes.html#slideshapes-objects

這是訪問標題形狀的首選方式。

請注意，並非所有幻燈片都有標題形狀，因此您必須測試None結果以避免在此情況下發生錯誤。

另請注意，用戶有時會爲標題使用不同的形狀，例如可能會添加一個單獨的新文本框。所以你不能保證將「出現」的文字作爲幻燈片中的標題。但是，您將獲得與PowerPoint考慮標題相匹配的文本，例如，它在「大綱」視圖中顯示爲該幻燈片標題的文本。

來源

2017-04-12 19:11:08 scanny

python-pptx從幻燈片標題中提取文本

回答

相關問題