Python BeautifulSoup無法選擇特定的標記

-3

我的問題是解析一個網站，然後加載數據樹與BS。我如何查找標籤的內容？我試過Python BeautifulSoup無法選擇特定的標記

for first in soup.find_all("li", class_="li-in"): 
    print first.select("em.fl.in-date").string 

        #or 

    print first.select("em.fl.in-date").contents

但它不工作。請幫助。

我對tutti.ch

尋找汽車這裏是我的全部代碼：

#Crawl tutti.ch 
import urllib 
thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" 
handle = urllib.urlopen(thisurl) 
html_gunk = handle.read() 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_gunk, 'html.parser') 

for first in soup.find_all("li", class_="li-in"): 
    if first.a.string and "Audi" and "BMW" in first.a.string: 
     print "Geschafft: %s" % first.a.contents 
     print first.select("em.fl.in-date").string 
    else: 
     print first.a.contents

當它找到一個寶馬或者奧迪應該檢查插入車內時。時間位於這樣的EM-標籤：

 Heute 13:59 

來源

2016-07-04 Voran Gensili

-1

first.select("em.fl.in-date").text

假設你的選擇是正確的。你沒有提供你正在抓取的URL，所以我不能確定。

>>> url = "http://stackoverflow.com/questions/38187213/python-beautifulsoup" 
>>> from bs4 import BeautifulSoup 
>>> import urllib2 
>>> html = urllib2.urlopen(url).read() 
>>> soup = BeautifulSoup(html) 
>>> soup.find_all("p")[0].text 
u'My problem is when parsing a website and then loading the data tree with BS. How can I look for the content of an <em> Tag? I tried '

看到你的代碼後，我做了如下改變，我們來看一看：

#Crawl tutti.ch 
import urllib 
thisurl = "http://www.tutti.ch/stgallen/fahrzeuge/autos" 
handle = urllib.urlopen(thisurl) 
html_gunk = handle.read() 

from bs4 import BeautifulSoup 
soup = BeautifulSoup(html_gunk, 'html.parser') 

for first in soup.find_all("li", class_="li-in"): 
    if first.a.string and "Audi" and "BMW" in first.a.string: 
     print "Geschafft: %s" % first.a.contents 
     print first.select("em.fl.in-date")[0].text 
    else: 
     print first.a.contents

來源

2016-07-04 14:41:11

太謝謝你了亞當·巴恩斯。你的代碼完美無缺！ –

'和「奧迪」總是會是真的 –

Python BeautifulSoup無法選擇特定的標記

回答

相關問題