2016-07-05 48 views
1

我試圖通過Python中的urllib2訪問twitter上的受保護頁面(例如我自己的列表),但是此代碼始終會將我發送回登錄頁面。任何想法,爲什麼?無法使用urllib2訪問登錄頁面

(我知道我可以使用Twitter的API和東西,但想在一般學習如何做到這一點)

感謝, 羅伊


代碼:

url = "https://twitter.com/login" 
protectedUrl = "https://twitter.com/username/likes 

USER = "myTwitterUser" 
PASS = "myTwitterPassword" 

cj = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
opener.addheaders = [('User-Agent', 'Mozilla/5.0'), ("Referer", "https://twitter.com")] 

hdr = {'User-Agent': 'Mozilla/5.0', "Referer":"https://twitter.com"} 
req = urllib2.Request(url, headers=hdr) 
page = urllib2.urlopen(req) 

html = page.read() 
s = BeautifulSoup(html, "lxml") 
AUTH_TOKEN = s.find(attrs={"name": "authenticity_token"})["value"] 

login_details = {"session[username_or_email]": USER, 
       "session[password]": PASS, 
       "remember_me": 1, 
       "return_to_ssl": "true", 
       "scribe_log": "", 
       "redirect_after_login": "/", 
       "authenticity_token": AUTH_TOKEN 
       } 

login_data = urllib.urlencode(login_details) 
opener.open(url, login_data) 
resp = opener.open(protectedUrl) 
print resp.read() 

回答

0

您需要發佈到正確的網址"https://twitter.com/sessions",當您發出初始請求獲得012時,使用opener也是必不可少的代替page = urllib2.urlopen(req)所以page = opener.open(req)所以我們獲得了餅乾需要:

​​

如果我們需要運行我的Twitter賬號一個不喜歡的代碼:

In [72]: login_details = {"session[username_or_email]": USER, 
    ....:     "session[password]": PASS, 
    ....:     "remember_me": 1, 
    ....:     "redirect_after_login": "/", 
    ....:     "authenticity_token": AUTH_TOKEN 
    ....:     } 

In [73]: # encode form data 

In [74]: login_data = urllib.urlencode(login_details) 

In [75]: r = opener.open("https://twitter.com/sessions", login_data) 

In [76]: # get likes now we have logged in 

In [77]: resp = opener.open(likes.format(USER)) 

In [78]: soup = BeautifulSoup(resp.read(),"lxml") 

In [79]: print(soup.select_one("p.empty-text")) 
<p class="empty-text"> 
     You haven't liked any Tweets yet. 

    </p> 

你可以看到,我們得到成功到我們想要的頁面。

做同樣的用requests.Session()對象,代碼少了很多事情:

USER = "username" 
PASS = "pass" 
post = "https://twitter.com/sessions" 
likes = "https://twitter.com/{}/likes" 
url = "https://twitter.com" 

data = {"session[username_or_email]": USER, 
     "session[password]": PASS, 
     "scribe_log": "", 
     "redirect_after_login": "/", 
     "remember_me": "1"} 

post = "https://twitter.com/sessions" 

with requests.Session() as s: 
    r = s.get(url) 
    soup = BeautifulSoup(r.content, "lxml") 
    AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"] 
    data["authenticity_token"] = AUTH_TOKEN 
    r = s.post(post, data=data) 
    soup = BeautifulSoup(r.content) 
    print(s.get("https://twitter.com/{}/likes".format(USER)).content) 
-1

從我的經歷像這樣的網站,你需要使用完整的HTTP標頭包括:

  • 接受
  • 接受編碼
  • 接受語言
  • 引薦
  • 升級不安全,請求
  • ...
  • 用戶代理

從標題只刪除的cookie。

您還需要創建會話並處理cookie,因爲twitter必須像Facebook一樣。我個人更喜歡使用「請求」,因爲您可以創建會話並輕鬆使用cookie。

你可以做這樣的事情:

import requests 
form time import sleep 

hd = {'h11': 'h12', 'h21': 'h22', 'h31': 'h32'} 
usrdata = {'user': USER, 'pass': PASS} 

sess = requests.Session() 
req = sess.get('http://www.twitter.com') ## to start session 
sleep(1) 
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd) 

希望這有助於。