2014-10-27 15 views
1

我是一個新的python。我想得到的代碼結果如下:使用python和美麗的湯從HTML獲取結構化數據

Score  Postive  Negative 
    5   good   bad 
    7  interesting 
    3      horrible 

但我的代碼輸出nothing.Please問題在哪裏?

from bs4 import BeautifulSoup 
text = """ 
... <body> 
     <div class="review"> 
     <p class="pos">good</p> 
     <p class="neg">bad</p> 
    </div> 
    <div class="review"> 
     <p class="pos">interesting</p> 
    </div> 
    <div class="review"> 
     <p class="neg">horrible</p> 
    </div> 
... </body>""" 
soup = BeautifulSoup(text) 
for parent in soup.find_all('div', attrs={'class': 'review'}): 
if parent.findNextSiblings('p', attrs={'class': 'pos'}): 
    postive.append(parent.get_text()) 
else: 
    postive.append("") 
if parent.findNextSiblings('p', attrs={'class': 'neg'}): 
    negtive.append(parent.get_text()) 
else: 
    negtive.append("") 

回答

1

p標籤不與reviewdiv標籤的兄弟姐妹,他們是孩子:

positive = [] 
negative = [] 
for div in soup.find_all('div', attrs={'class': 'review'}): 
    pos = div.find('p', {'class': 'pos'}) 
    positive.append(pos.get_text() if pos else '') 

    neg = div.find('p', {'class': 'neg'}) 
    negative.append(neg.get_text() if neg else '') 

print positive 
print negative 

打印:

[u'good', u'interesting', ''] 
[u'bad', '', u'horrible']