0
我希望你能幫助我,所以我需要創建解析文本的功能,並提取數據到大熊貓數據幀:解析和提取數據到大熊貓數據幀:BeautifulSoup和XML
「」 「 功能 --------- rcp_poll_data
Extract poll information from an XML string, and convert to a DataFrame
Parameters
----------
xml : str
A string, containing the XML data from a page like
get_poll_xml(1044)
Returns
-------
A pandas DataFrame with the following columns:
date: The date for each entry
title_n: The data value for the gid=n graph (take the column name from the `title` tag)
This DataFrame should be sorted by date
Example
-------
Consider the following simple xml page:
<chart>
<series>
<value xid="0">1/27/2009</value>
<value xid="1">1/28/2009</value>
</series>
<graphs>
<graph gid="1" color="#000000" balloon_color="#000000" title="Approve">
<value xid="0">63.3</value>
<value xid="1">63.3</value>
</graph>
<graph gid="2" color="#FF0000" balloon_color="#FF0000" title="Disapprove">
<value xid="0">20.0</value>
<value xid="1">20.0</value>
</graph>
</graphs>
</chart>
Given this string, rcp_poll_data should return
result = pd.DataFrame({'date': pd.to_datetime(['1/27/2009', '1/28/2009']),
'Approve': [63.3, 63.3], 'Disapprove': [20.0, 20.0]})
mycode的
def rcp_poll_data(xml):
soup = BeautifulSoup(xml,'xml')
dates=soup.find("series")
datesval=soup.findChildren(string=True)
del datesval[-7:]
obama=soup.find("graph",gid="1")
obamaval={"title":obama["title"],"color":obama["color"]}
romney=soup.find("graph",gid="2")
romneyval={"title":romney["title"],"color":romney["color"]}
result = pd.DataFrame({'date': pd.to_datetime(datesval,errors="ignore"), 'GID1':obamaval, 'GID2':romneyval})
return result
」「」 但是當我執行程序時,我總是收到這個錯誤: 與非系列字符串混合可能會導致模糊的排序。
請幫忙! PS:在get_poll功能是這樣的:
def get_poll_xml(poll_id):
url="http://charts.realclearpolitics.com/charts/"+str(poll_id)+".xml"
return requests.get(url).content
poll_id = 1044例如
哇,太感謝你了,我也沒知道xml.etree.ElementTree,謝謝你指出我! –