奇怪的行爲，而使用文字

我這樣做：奇怪的行爲，而使用文字

import win32com.client as win32 
infile = r"D:\path\to\file.docx" 
# def word_table(infile): 
word = win32.gencache.EnsureDispatch('Word.Application') 
doc = word.Documents.Open(infile) 
word.Visible = False 
rng = doc.Range() 
for tbl in rng.Tables: 
    for i in range(tbl.Rows.Count): 
     page_name = tbl.Cell(i, 1).Range.Paragraphs(1).Range.Text 
     hyper_link = tbl.Cell(i, 2).Range.Paragraphs(1).Range.Hyperlinks(1).Address 
     print(page_name, hyper_link)

僅打印hyper_link而不是page_name（即使我改變順序）。但如果我這樣做：

print(page_name) 
print(hyper_link)

這工作得很好。我無法猜測出現這種意外行爲的原因。

我張貼作爲回答這個問題： How to extract hyperlinks from MS Word table with Python?

來源

2017-07-19 Rahul

你在Python 2.x或3.x上運行這個嗎？在3.x中，我看不出有什麼區別（除了項目之間沒有換行符），但在2.x中，單個'print'語句中的括號意味着您實際上正在打印元組 - 所以你得到了項目的'repr（）'，而不是'str（）'。可能'page_name'是一個空白的'repr（）'對象？ – jasonharper

IPython 3.5是精確的。 – Rahul

的行爲是由於微軟的Word表格有表格單元格結束字符。

因此page_name = tbl.Cell(i, 1).Range.Paragraphs(1).Range.Text將抓取單元格中的任何文本加上CR（'\r'）和BEL（'？'）。因此它不能正確打印。

print(page_name.split('\r')[0] , hyper_link)在這種情況下工作得很好。

來源

2017-07-20 05:22:53 Rahul

奇怪的行爲，而使用文字

回答

相關問題