Reading新信息與urllib的

我得到了下面的代碼：Reading新信息與urllib的

import urllib 
import re 

def worldnews(): 
    count = 0 
    html = urllib.urlopen("https://www.reddit.com/r/worldnews/").readlines() 

    lines = html 
    for line in lines: 
     if "Paris" or "Putin" in line: 
      count = count + 1 
      print line  

    print "Totaal gevonden: ", count 
    print "----------------------" 

worldnews()

我如何才能找到在標題與巴黎或Puttin在該網頁上的所有reddit的崗位。有沒有辦法將這個標題的帖子打印到控制檯？當我運行這個時候，我得到了很多html代碼。

來源

2015-11-19 joey

看看[BeautifulSoup]（http://www.crummy.com/software/BeautifulSoup/bs4/doc/） – Celeo

只是一個說明，行'如果「巴黎」或「普京」在行：'總是會返回True，這就是爲什麼你會得到很多HTML代碼。如上所述，使用BeautifulSoup或其他HTML解析庫 –

在Python中使用HTML的最佳方式是BeautifulSoup。因此，您需要下載並查看文檔，以瞭解如何完成您要求的內容。不過，我給你開了一個開端：

import urllib 
from bs4 import BeautifulSoup 

def worldnews(): 
    count = 0 
    html = urllib.urlopen("https://www.reddit.com/r/worldnews/") 
    soup = BeautifulSoup(html,"lxml") 
    titles = soup.find_all('p',{'class':'title'}) 
    for i in titles: 
     print(i.text) 

worldnews()

當這個運行時，它給出了一個輸出看起來像這樣：在頁面上所有的標題

Paris attacks ringleader dead - French officials (bbc.com) 
Company which raised price of AIDS drug by 5500% reports $14m quarterly losses. (pinknews.co.uk) 
Syria/IraqSyrian man kills judge at ISIS Sharia Court for beheading his brother (en.abna24.com) 
Putin Puts $50 Million Bounty on Heads of Metrojet Bombers (fortune.com)

等。從這裏你應該能夠輕鬆搞清楚如何編碼其餘部分。 :-)

來源

2015-11-19 19:44:32 n1c9

非常感謝！這將幫助我 – joey

沒問題。讓我知道你是否需要幫助！ – n1c9

搜索結果的最佳方式是從腳本中獲取回來？因爲當我在標題上搜索時，我沒有找到任何東西。我認爲我必須搜索「href」，但是如何：對不起，我是Python新手 – joey

Reading新信息與urllib的

回答

相關問題