2013-08-27 23 views
0

我希望輸出爲「印地語」,「英語」。我能夠得到「印地文」,但我現在面臨的困難與輸出「英語」在html代碼中訪問子組中的值

輸入:

<td class="_480u"> 
<div class="clearfix"> 
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and 
     <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td> 

代碼我想:

>>> details.find('a',{'class':''}).string 
u'Hindi' 

s = details.findAll('a',{'class':''}) 
s1 = len(s) 
list2 = [] 
if s1 >= 1: 
    for j in range(0,s1): 
     lang = s[j].find('a',{'class':''}).string.strip() 
     list2.append(lang) 
Traceback (most recent call last): 
    File "<pyshell#220>", line 9, in <module> 
    lang = s[j].find('a',{'class':''}).string.strip() 
AttributeError: 'NoneType' object has no attribute 'string' 


>>> s 
[<a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a>, <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a>] 

回答

1

如果這是確切的HTML,贏得「T改變,你可以使用這個:

from bs4 import BeautifulSoup 

html = '<td class="_480u">\ 
<div class="clearfix">\ 
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and \ 
     <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>' 

soup = BeautifulSoup(html) 
print soup.find('a',{'class':''}).string 
print soup.find('a',{'class':''}).nextSibling.nextSibling.string 

輸出:

Hindi 
English 

或者你也可以做這樣的(如果你的工作只與您的問題張貼的HTML):

from bs4 import BeautifulSoup 

html = '<td class="_480u">\ 
<div class="clearfix">\ 
<div><a data-hovercard="/ajax/hovercard/page.php?id=112969428713061" href="https://www.facebook.com/pages/Hindi/112969428713061">Hindi</a> and \ 
     <a data-hovercard="/ajax/hovercard/page.php?id=106059522759137" href="https://www.facebook.com/pages/English/106059522759137">English</a></div></div></td>' 

soup = BeautifulSoup(html) 
lang = soup.findAll('a', href = True) 
for i in lang: 
    print i.string 

輸出:

Hindi 
English 
+0

我得到「RuntimeError :超過最大遞歸深度「使用第二種方法 – user1946217

相關問題