我有一個存儲在字符串中的值。我希望將該值附加到符合特定條件的行,而不是其他任何其他行。Python - 將str值附加到數據框中的某些行上
下圖顯示了我需要解析的表格。我可以使用BeautifulSoup
輕鬆解析文件,並將其轉化爲Pandas
DataFrame,但對於以下兩個表格,我都在努力捕獲並將Package
價格附加到整個DataFrame。理想情況下,價格值將與每個魚類重量對並排;所以單列價格相同。
這裏是我用來解析表的代碼:
with open(file_path) as in_f:
msg = email.message_from_file(in_f) #type: <class 'email.message.Messgae'>
html_msg = msg.get_payload(1) #type: <class 'email.message.Message'>
body = html_msg.get_payload(decode=True) #type: <class 'bytes'> or type: 'int'
html = body.decode() #type: <class 'str'>
tablez = BeautifulSoup(html).find_all("table") #type: <class 'bs4.element.ResultSet'>
data = []
for table in tablez:
for row in table.find_all("tr"):
data.append([cell.text.strip() for cell in row.find_all("td")])
fish_frame = pd.DataFrame(data)
這是data
是:
data: [['Species', 'Price', 'Weight'], ['GBW Cod', '.55', '8,059'], ['GBE Haddock', '.03', '14,628'], ['GBW Haddock', '.02', '87,451'], ['GB YT', '1.50', '1,818'], ['Witch', '1.25', '1,414'], ['GB Winter', '.40', '23,757'], ['Redfish', '.02', '123'], ['White Hake', '.40', '934'], ['Pollock', '.02', '7,900'], ['Package Price:', '', '$21,151.67'], ['Species', 'Weight'], ['GBE Cod', '820'], ['GBW Cod', '15,279'], ['GBE Haddock', '32,250'], ['GBW Haddock', '192,793'], ['GB YT', '6,239'], ['SNE YT', '2,018'], ['GOM YT', '1,511'], ['Plaice', '2,944'], ['Witch', '1,100'], ['GB Winter', '158,608'], ['White Hake', '31'], ['Pollock', '1,983'], ['SNE Winter', '7,257'], ['Price', '$58,500.00'], ['Species', 'Weight'], ['GBE Cod', '792'], ['GBW Cod', '14,767'], ['GBE Haddock', '29,199'], ['GBW Haddock', '174,556'], ['GB YT', '5,268'], ['SNE YT', '544'], ['GOM YT', '1,957'], ['Plaice', '2,452'], ['Witch', '896'], ['GB Winter', '163,980'], ['White Hake', '8'], ['Pollock', '1,743'], ['SNE Winter', '3,709'], ['Price', '$57,750.00']]
然後我用這段代碼捕獲Package
價格:
stew = BeautifulSoup(html, 'html.parser')
chunks = stew.find_all('p', {'class' : "MsoNormal"})
for line in chunks:
if 'Package' in line.text:
package_price = line.text
print("package_price:", package_price)
但我現在正努力將價格值添加到數據框中的自己的列。做一個命令,如fish_frame = pd.DataFrame(package_price)
結果:
Traceback (most recent call last): File "Z:/Code/NEFS_stock_then_weight_attempt3.py", line 236, in <module> fish_frame = pd.DataFrame(package_price) File "C:\Users\stephen.mahala\AppData\Local\Programs\Python\Python35-32\lib\site-packages\pandas\core\frame.py", line 345, in __init__ raise PandasError('DataFrame constructor not properly called!') pandas.core.common.PandasError: DataFrame constructor not properly called!
由於所不知道的我的原因。然而,將它轉換爲list
會導致字符串被分解,並且每個字符都會成爲自己的列表,因此每個字符都將成爲DataFrame中的自己的單元格。
有沒有一種方法Pandas
或BeautifulSoup
,我不知道這會簡化將這個單一值添加到我的DataFrame的過程?
你應該修改你的問題,以顯示您收到的錯誤的完整的具體回溯。 –
我在創建'fish_frame'後立即解析表,在我的第一塊代碼 – theprowler
是的,我看你是如何創建/初始化它可以顯示* full * traceback? –