使用BeautifulSoup解析表

我想使用BeautifulSoup從HTML表中提取數據並將其轉換爲焦慮7數據框與列：日期，交易，清單編號，發貨日期，付款類型，金額和預付平衡。使用BeautifulSoup解析表

的片段到目前爲止我的代碼：

def find_account_status(htmls): 
soup = BeautifulSoup(htmls) 
table = soup.find('table', border="0", cellpadding="2") 
table2 = table.find_all("td", {"class": "bodytext"}, text=True) 
print(table2.text.split())

下面是HTML的一個片段，我試圖提取：

來源

2016-03-17 Riley Hun

您可以使用pandas.read_html()：

import pandas as pd 

soup = BeautifulSoup(htmls) 
table = soup.find('table', border="0", cellpadding="2") 
df = pd.read_html(str(table))[0]

來源

2016-03-17 20:16:12 alecxe

嘗試從Pycharm下載lxml，但出現以下錯誤：錯誤：b「'xslt-config'未被識別爲內部或外部命令，\ r \ noperable程序或批處理文件。\ r \ n」 –

因爲當我運行你的代碼時，我得到錯誤「找不到lxml，請安裝它」 –

@RileyHun，有很多有關安裝lxml的問題和解決方案的信息，只是谷歌他們。或者，您可以通過提供'flavor'參數來改變解析器（[docs]（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_html.html））。 – alecxe

使用BeautifulSoup解析表

回答

相關問題