2016-05-23 50 views
1

我的python腳本解析來自多個RSS源的標題和鏈接。我將這些標題存儲在列表中,並且要確保我從不打印重複項目。我怎麼做?如何告訴python不要打印列表中的項目?

#!/usr/bin/python 
from twitter import * 
from goose import Goose 
import feedparser 
import time 
from pyshorteners import Shortener 
import pause 
import newspaper 

dr = feedparser.parse("http://www.darkreading.com/rss_simple.asp") 
sm =feedparser.parse("http://www.securitymagazine.com/rss/topic/2654-cyber-tactics.rss") 



dr_posts =["CISO Playbook: Games of War & Cyber Defenses", 
     "SWIFT Confirms Cyber Heist At Second Bank; Researchers Tie Malware Code to Sony Hack","The 10 Worst Vulnerabilities of The Last 10 Years", 
     "GhostShell Leaks Data From 32 Sites In 'Light Hacktivism' Campaign", 
      "OPM Breach: 'Cyber Sprint' Response More Like A Marathon", 
     "Survey: Customers Lose Trust In Brands After A Data Breach", 
     "Domain Abuse Sinks 'Anchors Of Trust'", 
     "The 10 Worst Vulnerabilities of The Last 10 Years", 
] 

sm_posts = ["10 Steps to Building a Better Cybersecurity Plan"] 

x = 1 

while True: 

    try: 

     drtitle = dr.entries[x]["title"] 
     drlink = dr.entries[x]["link"] 
     if drtitle in dr_posts: 
      x += 1 
      drtitle = dr.entries[x]["title"] 
      drtitle = dr.entries[x]["link"] 
      print drtitle + "\n" + drlink 
      dr_posts.append(drtitle) 
      x -= 1 
      pause.seconds(10) 
     else: 
      print drtitle + "\n" + drlink 
      dr_posts.append(drtitle) 
      pause.seconds(10) 

     smtitle = sm.entries[x]["title"] 
     smlink = sm.entries[x]["link"] 
     if smtitle in sm_posts: 
      x +=1 
      smtitle = sm.entries[x]["title"] 
      smtitle = sm.entries[x]["title"] 
      print smtitle + "\n" + smlink 
      sm_posts.append(smtitle) 
      pause.seconds(10) 
    else: 
     print smtitle + "\n" + smlink 
     sm_posts.append(smtitle) 
     x+=1 
     pause.seconds(10) 



except IndexError: 
    print "FAILURE" 
    break 

我暫時只讓它跳過條目。這將是一個問題,因爲如果在RSS提要中有更多的重複,那麼我會有更多的重複。

回答

2

您可以利用數據結構set,因爲其「唯一性」屬性將爲您完成工作。基本上,我們可以讓你的列表成爲一個集合,然後再次設置一個列表,這可以確保你的列表現在填充了嚴格唯一的值。

如果你有一個列表L,那麼你可以把它通過獨特

l = list(set(l)) 
+0

謝謝!這真的幫助了我! – Frank

+0

沒問題的人! Glsd幫助。隨時將我的答案標記爲已接受(複選框) –

0

如果你不希望打印您可以使用counterdefaultdict

sm_posts = defaultdict(int) 
sm_posts[sm_links] += 1 
print sm_posts.keys() #will print all the unique links 

重複鏈接好處是你也可以通過做鏈接重複鏈接的次數

sm_posts[sm_links] 
>>> link_counts 

試試吧。