2014-02-08 39 views
-1

因此,我想要做的是登錄到Wallbase.cc,然後獲取NSFW牆紙的標籤(您需要登錄)。看起來好像我可以很好地登錄,但當我嘗試訪問壁紙頁面時,它會引發403錯誤。這是我正在使用的代碼:如何使用urllib登錄,然後使用Python中的權限訪問網頁?

import urllib2 
import urllib 
import cookielib 
import re 

username = 'xxxx' 
password = 'xxxx' 

cj = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) 
urllib2.install_opener(opener) 
payload = { 
    'csrf' : '371b3b4bd0d1990048354e2056cd36f20b1d7088', 
    'ref' : 'aHR0cDovL3dhbGxiYXNlLmNjLw==', 
    'username' : username, 
    'password' : password 
    } 
login_data = urllib.urlencode(payload) 
req = urllib2.Request('http://wallbase.cc/user/login', login_data) 

url = "http://wallbase.cc/wallpaper/2098029" 

#Opens url of each pic 
usock = urllib2.urlopen(url) 
data = usock.read() 
usock.close() 

任何想法? 順便說一句,使用的壁紙實際上並不NSFW它被錯誤地標記。

回答

0

你可以試試這個庫http://wwwsearch.sourceforge.net/mechanize/

這裏是個例:

import re 
import mechanize 

br = mechanize.Browser() 
br.open("http://www.example.com/") 
# follow second link with element text matching regular expression 
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) 
assert br.viewing_html() 
print br.title() 
print response1.geturl() 
print response1.info() # headers 
print response1.read() # body 

br.select_form(name="order") 
# Browser passes through unknown attributes (including methods) 
# to the selected HTMLForm. 
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) 
# Submit current form. Browser calls .close() on the current response on 
# navigation, so this closes response1 
response2 = br.submit() 

# print currently selected form (don't call .submit() on this, use br.submit()) 
print br.form 

response3 = br.back() # back to cheese shop (same data as response1) 
# the history mechanism returns cached response objects 
# we can still use the response, even though it was .close()d 
response3.get_data() # like .seek(0) followed by .read() 
response4 = br.reload() # fetches from server 

for form in br.forms(): 
print form 
# .links() optionally accepts the keyword args of .follow_/.find_link() 
for link in br.links(url_regex="python.org"): 
print link 
    br.follow_link(link) # takes EITHER Link instance OR keyword args 
    br.back() 
+0

我看到之前機械化,但想通這將會是容易只是堅持的urllib。如果urllib不起作用,我可以深入研究一下...... – user3288438

相關問題