使用python找到獨特的網頁鏈接

我正在編寫一個程序，以從www.stevens.edu（這是一個任務）中提取獨特的網頁鏈接，但有一個問題。我的程序正在爲所有網站提取鏈接，除了www.stevens.edu，我將其輸出爲「無」。我很沮喪，這和需要help.i我使用這個網址進行測試 - http://www.stevens.edu/使用python找到獨特的網頁鏈接

import urllib 
from bs4 import BeautifulSoup as bs 

url = raw_input('enter - ') 

html = urllib.urlopen(url).read() 

soup = bs (html) 

tags = soup ('a') 

for tag in tags: 
    print tag.get('href',None)

請指導我這裏，讓我知道它爲什麼不與www.stevens.edu工作？

來源

2016-04-20 siddpro

該網站檢查User-Agent標題，並返回不同的html基礎。

您需要設置User-Agent頭得到適當的HTML：

import urllib 
import urllib2 
from bs4 import BeautifulSoup as bs 

url = raw_input('enter - ') 
req = urllib2.Request(url, headers={'User-Agent': 'Mozilla/5.0'}) # <-- 
html = urllib2.urlopen(req).read() 
soup = bs(html) 
tags = soup('a') 
for tag in tags: 
    print tag.get('href', None)

來源

2016-04-20 05:46:41 falsetru

使用python找到獨特的網頁鏈接

回答

相關問題