2015-08-25 30 views
0

您好我過濾掉一些公佈的網站上使用以下腳本過濾掉HREF在列表中,而不是soup.find_all

gdata_even=soup.find_all("li", {"class":"list2Col even "}) 
gdata_odd=soup.find_all("li", {"class":"list2Col odd "}) 

最後我只採取一些公告中GDATA取決於是否項具有一定的字:

for l in range(len_data): 
      if _checkDate(gdata_even[l].text): 
       if _checkwordsV2(gdata_even[l].text): 
        pass 
       else: 
        initial_list.append(gdata_even[l].text.encode("utf-8")) 

      if _checkDate(gdata_odd[l].text): 
       if _checkwordsV2(gdata_odd[l].text): 
        pass 
       else: 
        initial_list.append(gdata_odd[l].text.encode("utf-8")) 

我現在面臨的問題是,gdata_even [1]和gdata_odd [1]具有以下輸出:

<li class="list2Col even "><div class="indexCol"><span class="date">25 Aug 2015 12:00:06 AM CEST</span></div><div class="contentCol"><div class="categories">Frankfurt</div><h3><a href="/xetra-en/newsroom/xetra-newsboard/FRA-Deletion-of-Instruments-from-XETRA---25.08.2015-001/1913134">FRA:Deletion of Instruments from XETRA - 25.08.2015-001</a></h3></div></li> 

在這裏,我想這是嵌入與下面的代碼中的href該項目的鏈接,但它不工作:

h3Url = gdata[l].find("a").get("href") 
    print h3Url 

可有人請幫助,謝謝。

+0

什麼是錯誤或你得到什麼 –

回答

0

也許你是如何獲得gdata的錯誤,因爲你的代碼應該工作。

>>> from BeautifulSoup import BeautifulSoup 
>>> doc='<li class="list2Col even "><div class="indexCol"><span class="date">25 Aug 2015 12:00:06 AM CEST</span></div><div class="contentCol"><div class="categories">Frankfurt</div><h3><a href="/xetra-en/newsroom/xetra-newsboard/FRA-Deletion-of-Instruments-from-XETRA---25.08.2015-001/1913134">FRA:Deletion of Instruments from XETRA - 25.08.2015-001</a></h3></div></li>' 
>>> soup = BeautifulSoup(doc) 
>>> h3Url = soup.find('a').get('href') 
>>> print h3Url 

/xetra-en/newsroom/xetra-newsboard/FRA-Deletion-of-Instruments-from-XETRA---25.08.2015-001/1913134