2014-09-29 22 views
0

目標文本輸出當然名字的字典和他們的等級從這個:用美麗的湯拉從多個<tr>的

<tr> 
<td class="course"><a href="/courses/1292/grades/5610">Modern Europe &amp; the World - Dewey</a></td> 
<td class="percent"> 
    92% 
</td> 
<td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td> 
</tr> 

這樣:

{Modern Europe &amp; the World - Dewey: 92%, the next couse name: grade...etc} 

我知道如何只是找到百分比標籤或只是一個href標籤,但我不確定如何獲取文本並將其編譯到字典中,因此它更加實用。謝謝!

回答

1

由於每個tr包含一系列包含所需信息的td元素,您只需使用find_all()將它們收集到列表中,然後提取所需的信息:

from bs4 import BeautifulSoup 

soup = BeautifulSoup(""" 
<tr> 
<td class="course"><a href="/courses/1292/grades/5610">Modern Europe &amp; the World - Dewey</a></td> 
<td class="percent"> 
    92% 
</td> 
<td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td> 
</tr> 
""") 

grades = {} 

for tr in soup.find_all("tr"): 
    td_text = [td.text.strip() for td in tr.find_all("td")] 
    grades[td_text[0]] = td_text[1] 

結果:

>>> grades 
{u'Modern Europe & the World - Dewey': u'92%'} 
1

試試這個:
對於每個tr元素,試圖找到孩子,你需要什麼(那些誰擁有coursepercent類)
如果同時存在,則建立grades字典

>>> from bs4 import BeautifulSoup 
>>> html = """ 
... <tr> 
... <td class="course"><a href="/courses/1292/grades/5610">Modern Europe &amp; the World - Dewey</a></td> 
... <td class="percent"> 
...  92% 
... </td> 
... <td style="display: none;"><a href="#" title="Send a Message to the Teacher" class="no-hover"><img alt="Email" src="/images/email.png?1395938788" /></a></td> 
... </tr> 
... """ 
>>> 
>>> soup = BeautifulSoup(html) 
>>> grades = {} 
>>> for tr in soup.find_all('tr'): 
...  td_course = tr.find("td", {"class" : "course"}) 
...  td_percent = tr.find("td", {"class" : "percent"}) 
...  if td_course and td_percent: 
...   grades[td_course.text.strip()] = td_percent.text.strip() 
... 
>>> 
>>> grades 
{u'Modern Europe & the World - Dewey': u'92%'}