用Python閱讀網頁的內容

-1

我正在嘗試獲取網頁的內容。由於某種原因，每當我嘗試urlopen它說有沒有這樣的資源。我也不能使用urllib2。用Python閱讀網頁的內容

我只想得到這樣一個網頁的內容http://www.example.com

import urllib 
import re 

textfile = open('depth_1.txt','w') 
print("Enter the URL you wish to crawl..") 
print('Usage - "http://phocks.org/stumble/creepy/" <-- With the double quotes') 
myurl = input("@> ") 
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(myurl).read(), re.I): 
    print(i) 
    for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(i).read(), re.I): 
      print(ee) 
      textfile.write(ee+'\n') 
textfile.close()

以下是錯誤：

Traceback (most recent call last): 
    File "/Users/austinhitt/Desktop/clases_example.py", line 8, in <module> 
    for i in re.findall('''href=["'](.[^"']+)["']''', 
urllib.urlopen(myurl).read(), re.I): 
AttributeError: module 'urllib' has no attribute 'urlopen'

來源

2016-03-05 HittmanA

您正在使用Python 3，但res您從中學習Python已經過時了，並且使用了Python 2.'urllib2'在Python 3中沒有更多的功能，它的功能主要存在於'urllib'及其子模塊 –

對於學習資源我推薦[自動化無聊的東西Python]（https://automatetheboringstuff.com/），其中包括使用Python 3的網頁抓取章節。 –

@AnttiHaapala我同意你的意見。因此我需要知道如何在python 3中打開一個url。我的IDLE shell說url urlopen不起作用。 – HittmanA

僅適用於內容的使用要求，如果你想角落找尋玩與您需要使用scrapy的內容，例如：

import requests 
r = requests.get('http://scrapy.org') 
r.content 
r.headers 
r.status_code

來源

2016-03-05 21:20:28

我不確定爲什麼你在示例代碼中聲明scrapy是必需的。 – tagoma

不，我說如果他只想要內容他可以使用請求，但如果他需要別的東西，他可以使用scrapy，我的例子是使用請求。 –

用Python閱讀網頁的內容

回答

相關問題