2016-10-02 45 views
0

作爲一個練習,我試圖打印所有職位的標題與超過200評論來自網站reddit.com。如何獲得父母相同的兩個不同元素的列表

我試了一下:

import requests 
from bs4 import BeautifulSoup 


url1 = "https://www.reddit.com/" 
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} 

res = requests.get(url1, headers=headers) 
res.raise_for_status() 
soup = BeautifulSoup(res.content, "html5lib") 

g = soup.select('ul > li.first') 
j = soup.select('#siteTable div.entry.unvoted > p.title > a ') 
list1 = [] 
for t in j: 
    list.append(t.text) 

list2=[] 
for s in g: 
    for p in s.text.split(" "): 
     if p.isdigit(): 
      p = int(p) 
      if p > 100: 
       list2.append(p) 

for q,l in zip(list1, list2): 
    if l > 200: 
     print(q,l) 

問題:

它的工作原理半路,直到有一個打嗝某處和列表不會再匹配。結果,我得到的評論少於200條。

輸出:

What the F David Blaine!! 789 
So NYC MTA (subway) banned all dogs unless the owner carries them in a bag. I think this owner nailed it. 1075 
Bad to the bone 307 
TIL there is a "white man" café in Tokyo, where Japanese ladies ring a bell to summon tuxedo-wearing caucasians who respond with "yes, princess?" and serve them cake 2145 
Earthquake Warning Issued in California 1410 
Man impersonating officer busted for attempting to pull over unmarked cruiser 1022 
Use of body-worn cameras sees complaints against police ‘virtually vanish’, study finds 2477 
Amazing one handed interception 759 
A purrfectly executed leap 518 
"This bed has a fur pillow, I'll lay here." 792 
Back in 'Nam, 1969. Guy on the left is a good friend of mine's dad. He's in hospice now and not doing well but he'll live on in photos. 264 
Nintendo Entertainment System: NES Classic Edition - with 30 games - Available in US 11/11/16 290 
A scenic view ruined by a drunk driver (Star Wars: Battlefront) 2737 
Clouds battling a sunset over Olympic National Park, WA, USA (1334x750) [OC] 2222 
What company is totally guilty of false advertising and why? 2746 
South Korean President Park Geun-hye has called on North Koreans to abandon their country and defect, just a day after a soldier walked across the heavily fortified border into the South 410 
TIFU by underestimating the stupidity of multiple people 334 
Special Trump burger at a burger chain in South Africa 311 
This Special Ed Teacher Had All of Her Students in Her Wedding 984 

的messup後

有人能指出我這是怎麼了確切的問題或另一種方式「由...糟蹋了美景」開始?

+0

哪個標題列表中的小於200個評論?還有一件事,'list.append(t.text)'應該是'list1.append(t.text)'。 – qmaruf

+0

由醉酒司機(星球大戰:前線)破壞的景色>>>少於200條評論。你是對的list1 ....它在我的程序中實際上是正確的(都是列表),但我想在這裏更清晰,所以我在創建空列表時更改爲list1,但忘記了進一步更改 – BitByBit

回答

0

而不是首先將它保存到列表並希望這兩個列表匹配(list1 [0] ~~ list2 [0])....我試圖找到最常見的分母(父),並應用該類再次選擇(beautifulsoup)以深入研究dom(兒童)並立即打印。在抓取大量使用的網站(例如reddit)的同時,即使相隔數秒,也可能發生改變,並且在將數據保存到列表並進行比較時,可能會導致打嗝。

解決方案:

import requests 
from bs4 import BeautifulSoup 


url1 = "https://www.reddit.com/" 
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} 

res = requests.get(url1, headers=headers) 
res.raise_for_status() 
soup = BeautifulSoup(res.content, "html5lib") 

k= soup.select('#siteTable div.entry.unvoted') # partent 

for v in k: 
    d = v.select('ul > li.first') #comment 
    o = v.select('p.title > a') #title 
    for z,x in zip(d,o): 
     for p in z.text.split(" "): # convert "351 comments" to integer "351" and compare with 200 
      if p.isdigit(): 
       p = int(p) 
       if p > 200: 
        print(z.text, x.text) #print comments first then title 
相關問題