2016-07-04 35 views
-3

我的問題是解析一個網站,然後加載數據樹與BS。我如何查找<em>標籤的內容?我試過Python BeautifulSoup無法選擇特定的標記

for first in soup.find_all("li", class_="li-in"): 
    print first.select("em.fl.in-date").string 

        #or 

    print first.select("em.fl.in-date").contents 

但它不工作。請幫助。

我對tutti.ch

尋找汽車這裏是​​我的全部代碼:

#Crawl tutti.ch 
import urllib 
thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" 
handle = urllib.urlopen(thisurl) 
html_gunk = handle.read() 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_gunk, 'html.parser') 

for first in soup.find_all("li", class_="li-in"): 
    if first.a.string and "Audi" and "BMW" in first.a.string: 
     print "Geschafft: %s" % first.a.contents 
     print first.select("em.fl.in-date").string 
    else: 
     print first.a.contents 

當它找到一個寶馬或者奧迪應該檢查插入車內時。時間位於這樣的EM-標籤:

<em class="fl in-date"> Heute <br></br> 13:59 </em>

回答

-1
first.select("em.fl.in-date").text 

假設你的選擇是正確的。你沒有提供你正在抓取的URL,所以我不能確定。

>>> url = "http://stackoverflow.com/questions/38187213/python-beautifulsoup" 
>>> from bs4 import BeautifulSoup 
>>> import urllib2 
>>> html = urllib2.urlopen(url).read() 
>>> soup = BeautifulSoup(html) 
>>> soup.find_all("p")[0].text 
u'My problem is when parsing a website and then loading the data tree with BS. How can I look for the content of an <em> Tag? I tried ' 

看到你的代碼後,我做了如下改變,我們來看一看:

#Crawl tutti.ch 
import urllib 
thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" 
handle = urllib.urlopen(thisurl) 
html_gunk = handle.read() 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_gunk, 'html.parser') 

for first in soup.find_all("li", class_="li-in"): 
    if first.a.string and "Audi" and "BMW" in first.a.string: 
     print "Geschafft: %s" % first.a.contents 
     print first.select("em.fl.in-date")[0].text 
    else: 
     print first.a.contents 
+0

太謝謝你了亞當·巴恩斯。你的代碼完美無缺! –

+0

'和「奧迪」總是會是真的 –