使用BeautifulSoup刪除重複標籤內容

我製作了一個腳本，用於從網站的所有76個頁面獲取每個H1標籤。但在這個過程中，我的程序複製了一個非常特定的行「Current Affairs January 2015」，因爲該行在每個頁面中都存在。我可以編輯代碼以僅打印一次嗎？使用BeautifulSoup刪除重複標籤內容

這裏是我的代碼：

from bs4 import BeautifulSoup as bs 
import urllib 


for i in range(2,77): 
    url1="http://currentaffairs.gktoday.in/month/current-affairs-january-2015/"+"page/"+str(i) 
    soup = bs(urllib.urlopen(url1)) 
    for link in soup.findAll('h1'): 
     print link.string

在此先感謝。

來源

2016-01-27 icodekamal.com

from bs4 import BeautifulSoup as bs 
import urllib 


for i in range(2,77): 
    url1="http://currentaffairs.gktoday.in/month/current-affairs-january-2015/"+"page/"+str(i) 
    soup = bs(urllib.urlopen(url1)) 
    uLinks = soup.findAll('h1') 
    for index, item in enumerate(uLinks): 
      if i == 2:     
       print(item.string)     
      if i != 2: 
       if index != 0:            
         print(item.string)

來源

2016-01-27 18:00:11

謝謝。它工作完美。 –

使用BeautifulSoup刪除重複標籤內容

回答

相關問題