從需要cookie的網站收集Python文章

我正試圖從infoweb.newsbank.com的數據庫中收集關於我在大學所做研究的文章。到目前爲止，這是我的代碼：從需要cookie的網站收集Python文章

from bs4 import BeautifulSoup 
import requests 
import urllib 
from requests import session 
import http.cookiejar 


mainLink = "http://infoweb.newsbank.com.proxy.lib.uiowa.edu/iw-search/we/InfoWeb?p_product=AWNB&p_theme=aggregated5&p_action=doc&p_docid=14D12E120CD13C18&p_docnum=2&p_queryname=4" 




def articleCrawler(mainUrl): 
    response = urllib.request.urlopen(mainUrl) 
    soup = BeautifulSoup(response) 
    linkList = [] 
    for link in soup.find_all('a'): 
     print(link) 

articleCrawler(mainLink)

Unfortunatrly我回來這樣的響應：

<html> 
<head> 
<title>Cookie Required</title> 
</head> 
<body> 
This is cookie.htm from the doc subdirectory. 
<p> 
<hr> 
<p> 

Licensing agreements for these databases require that access be extended 
only to authorized users. Once you have been validated by this system, 
a "cookie" is sent to your browser as an ongoing indication of your authorization to 
access these databases. It will only need to be set once during login. 
<p> 
As you access databases, they may also use cookies. Your ability to use those databases 
may depend on whether or not you allow those cookies to be set. 
<p> 
To login again, click <a href="login">here</a>. 
</p></p></p></hr></p></body> 
</html> 

<a href="login">here</a>

我使用http.cookiejar嘗試過，但我不熟悉的圖書館。我正在使用Python 3.有誰知道如何接受cookie並訪問文章？謝謝。

來源

2014-04-11 Solsma Dev

我對Python3並不熟悉，但在Python2中接受cookie的標準方法是將HTTPCookieProcessor作爲您的OpenerDirector中的一個處理程序。

所以，這樣的事情：

import cookielib, urllib, urllib2 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))

opener現在準備打開一個URL（可能使用用戶名和密碼），並把它收到任何cookie到其綜合CookieJar：

params = urllib.urlencode({'username': 'someuser', 'password': 'somepass'}) 
opener.open(LOGIN_URL, params)

如果登錄成功，opener現在將擁有任何身份驗證令牌，服務器會以Cookie形式圍繞它進行訪問。然後你只需訪問你首先想要的鏈接：

f = opener.open(mainLink)

同樣，不知道如何轉換爲Python3，但我認爲你至少要cookielib.CookieJar，而不是http.cookiejar。我認爲後者是用於創建HTTP cookie內容作爲服務器，而不是作爲客戶端接收cookie內容。

來源

2014-04-11 22:27:47 dg99

好的，我會檢查出來並在以後回覆。謝謝。 –

從需要cookie的網站收集Python文章

回答

相關問題