2015-06-09 32 views
1

以下是獲取div屬性值的小代碼。所有div名稱都與相同的attr名稱相同。獲取div屬性val和div文本體

redditFile = urllib2.urlopen("http://www.bing.com/videos?q=owl") 
redditHtml = redditFile.read() 
redditFile.close() 
soup = BeautifulSoup(redditHtml) 

productDivs = soup.findAll('div', attrs={'class' : 'dg_u'}) 
for div in productDivs: 
    print div.find('div', {"class":"vthumb"})['smturl'] 
    #print div.find("div", {"class":"tl text-body"}) This print none rather then div text 

第一次印刷了一些網址(有時4,6,8等),然後

KeyError         Traceback (most recent call last) 
<ipython-input-34-cc950a8a84f7> in <module>() 
    26 productDivs = soup.findAll('div', attrs={'class' : 'dg_u'}) 
    27 for div in productDivs: 
---> 28  print div.find('div', {"class":"vthumb"})['smturl'] 
    29  print div.find("div", {"class":"tl text-body"}) 

/usr/local/lib/python2.7/dist-packages/bs4/element.pyc in __getitem__(self, key) 
    903   """tag[key] returns the value of the 'key' attribute for the tag, 
    904   and throws an exception if it's not there.""" 
--> 905   return self.attrs[key] 
    906 
    907  def __iter__(self): 

KeyError: 'smturl' 

所有div名稱相同與相同smturl ATTR的名字,爲什麼它給KeyError任何幫助嗎?

+3

並非所有'div'都具有'smturl'屬性。有一種方法可以找到:'for div in productDivs:if'smturl'not in div.find('div',{「class」:「vthumb」})。attrs:print(div)' – styvane

回答

2

並非所有div都具有smturl屬性,所以您需要將該屬性添加到find調用中。 productDivs也不是所有元素都包含您正在尋找的div,因此我已經添加了測試,如果find返回None。

In [27]: for div in productDivs: 
    ....:  if div.find('div', {"class":"vthumb", 'smturl': True}) is not None: 
    ....:   print div.find('div', {"class":"vthumb", 'smturl': True})['smturl'] 
    ....: 
http://ts2.mm.bing.net/th?id=OMB.9hfZ6cCDfUWbpw&pid=2.1 
http://ts4.mm.bing.net/th?id=OMB1.n%2b12M8SoyFcsag&pid=2.1 
http://ts4.mm.bing.net/th?id=OMB.ev1wnIiszGjhUg&pid=2.1 
http://ts4.mm.bing.net/th?id=OMB.hDLa5PO07Chclw&pid=2.1 
http://ts2.mm.bing.net/th?id=OMB.xDT9H25QFJ2jBw&pid=2.1 
http://ts3.mm.bing.net/th?id=OMB.BULQolkxkaZ0uw&pid=2.1 
http://ts3.mm.bing.net/th?id=OMB.xp3c0DyKrfmB7Q&pid=2.1 
http://ts4.mm.bing.net/th?id=OMB.MxP9fUyaJCRyhw&pid=2.1 
http://ts4.mm.bing.net/th?id=OMB2.CWjPPKiJQc4z6w&pid=2.1 
http://ts1.mm.bing.net/th?id=OMB1.ZVKhvML3%2bPzM1w&pid=2.1 
http://ts1.mm.bing.net/th?id=OMB.SLn%2b0NwKeUdZXw&pid=2.1 
http://ts2.mm.bing.net/th?id=OMB.4HJqrT9pBevGlg&pid=2.1 
http://ts2.mm.bing.net/th?id=OMB2.HgWYR9sjPw6JlQ&pid=2.1 
http://ts1.mm.bing.net/th?id=OMB.RyBXWQ9sH9wThw&pid=2.1 
http://ts2.mm.bing.net/th?id=OMB2.Vf21EgXRXMcdfg&pid=2.1 
http://ts3.mm.bing.net/th?id=OMB2.BIb6qwbHniC1vw&pid=2.1 
http://ts3.mm.bing.net/th?id=OMB1.H9bwRYncKU380A&pid=2.1 
http://ts2.mm.bing.net/th?id=OM1.mBXeu55OD4VimQ&pid=2.1 
+0

非常感謝,你能告訴我爲什麼'print div.find(「div」,{「class」:「tl text-body」})'不打印div內容嗎? – nlper

+1

@nlper,因爲頁面上沒有這樣的div。我已經使用Chrome開發人員工具找到它們,並且沒有任何東西'$ x(「// div [contains(@class,'text-body')]」); []' – Alik

+0

'

Husband Fools Wife - Hindi Jokes 9
'這是div,我正試圖用'text-body'獲得'丈夫愚人妻子 - 印地語' – nlper