2017-08-02 28 views
2

我試圖讀取和列表的文本在TD屬性錯誤與讀取HTML代碼,並基於數據屬性與beautifulsoup beautifulsoup

tr=BeautifulSoup(str(input),'lxml') 
     tags=tr.findAll('td') 
     for t in tags:  
      if t.attrs['data-property']== 'OSVersion': 
       ver=t.text 

這給了我的錯誤,沒有細節

KeyError: 'data-property' 

請看下面的例子tr提取爲輸入

<tr > 
<td class=" resizable reorderable" data-property="OSVersion">10.2.1</td> 
<td class=" resizable reorderable" data-property="DisplayModel">iPad Mini 4 (64 GB Space Gray)</td> 
<td class=" resizable reorderable" data-property="PhoneNumber"></td> 
<td class="grid_customvariable_colsize resizable reorderable" data-property="DeviceCustomAttributeDetails"></td> 
<td class=" resizable reorderable" data-property="DeviceTagDetails"></td> 
<td class=" resizable reorderable" data-property="EnrollmentStatusName"> <div class="grid_resizable_col">Enrolled</div> 
</td> 
<td class=" resizable reorderable" data-property="ComplianceStatusName"> <div class="grid_resizable_col">Compliant</div> 
</td> 

<td class=" resizable reorderable" data-property="IMEI"></td> 
<td class=" resizable reorderable" data-property="LocationGroupName">iOS</td> 
<td class=" resizable reorderable" data-property="IsCompromisedYN">No</td> 
<td class=" resizable reorderable" data-property="HomeCarrier">Not Reported </td> 
<td class=" resizable reorderable" data-property="CurrentCarrier">Not Reported </td> 
<td class=" resizable reorderable" data-property="WiFiIPAddress"></td> 

<td class=" resizable reorderable" data-property="Notes"></td> 
<td class=" resizable reorderable" data-property="WnsStatus">  <span>Disconnected</span> 
</td> 
<td class=" resizable reorderable" data-property="DmLastSeenTime"> <span class="icon arrow_down_stretched red">-</span> 
</td>      
</tr> 

如果我把單個字典作爲f然而,它工作正常

d={'class': ['', 'resizable', 'reorderable'], 'data-property': 'FriendlyName'} 
print d['data-property'] 

任何人有想法如何解決它?

謝謝

+2

在'BeautifulSoup'重命名變量'str' - 這是一個保留字。 –

+0

試過,那不是它,如果多數民衆贊成的原因,它不會運行傳遞到下一行 – ikel

回答

0

是的,正確的。我們錯了。

在我們的代碼做如下改變,因爲你有KeyError

if 'data-property' in t.attrs and t.attrs['data-property']== 'OSVersion':

我的演示代碼答案:

元組的t.attrs退貨單。例如[(u'class', u' resizable reorderable'), (u'data-property', u'OSVersion')]

我們需要通過dict方法轉換爲字典格式。例如attributes = dict(t.attrs)

並在條件下,檢查鍵是否存在。例如if 'data-property' in attributes and attributes['data-property']== 'OSVersion':

演示:

import BeautifulSoup 
tr = BeautifulSoup.BeautifulSoup(data) 
tags = tr.findAll('td') 
for t in tags:  
    attributes = dict(t.attrs) 
    if 'data-property' in attributes and attributes['data-property']== 'OSVersion': 
     ver = t.text 

讓我知道,如果您有任何仍問題(一個或多個)。免費ping我。

+1

這實際上工作,我檢查所有的鍵,事實證明有一個td缺少數據屬性attr不明原因,一旦我添加一條線來測試這個屬性的存在,這一切工作,非常感謝幫助! – ikel

2

沒有必要惹attrs

from bs4 import BeautifulSoup as BS 

html = """<tr > 
<td class=" resizable reorderable" data-property="OSVersion">10.2.1</td> 
<td class=" resizable reorderable" data-property="DisplayModel">iPad Mini 4 (64 GB Space Gray)</td> 
<td class=" resizable reorderable" data-property="PhoneNumber"></td> 
<td class="grid_customvariable_colsize resizable reorderable" data-property="DeviceCustomAttributeDetails"></td> 
<td class=" resizable reorderable" data-property="DeviceTagDetails"></td> 
<td class=" resizable reorderable" data-property="EnrollmentStatusName"> <div class="grid_resizable_col">Enrolled</div> 
</td> 
<td class=" resizable reorderable" data-property="ComplianceStatusName"> <div class="grid_resizable_col">Compliant</div> 
</td> 

<td class=" resizable reorderable" data-property="IMEI"></td> 
<td class=" resizable reorderable" data-property="LocationGroupName">iOS</td> 
<td class=" resizable reorderable" data-property="IsCompromisedYN">No</td> 
<td class=" resizable reorderable" data-property="HomeCarrier">Not Reported </td> 
<td class=" resizable reorderable" data-property="CurrentCarrier">Not Reported </td> 
<td class=" resizable reorderable" data-property="WiFiIPAddress"></td> 

<td class=" resizable reorderable" data-property="Notes"></td> 
<td class=" resizable reorderable" data-property="WnsStatus">  <span>Disconnected</span> 
</td> 
<td class=" resizable reorderable" data-property="DmLastSeenTime"> <span class="icon arrow_down_stretched red">-</span> 
</td>      
</tr>""" 

soup = BS(html) 
tags=soup.findAll('td') 
for t in tags: 
    if t['data-property'] == 'OSVersion': 
     ver=t.text 
     print(ver) 

輸出:

10.2.1 
0

這。 代碼:

from bs4 import BeautifulSoup 
with open("xmlfile.xml", "r") as f: # opening xml file 
    content = f.read() # xml content stored in this variable 
soup = BeautifulSoup(content, "lxml") 
for values in soup.findAll("td"): 
    if values["data-property"] == "OSVersion": 
     print values.text 

輸出:

10.2.1