我是Python新手和Web搜索,我正在嘗試編寫一個非常基本的腳本,它將從只能在登錄後才能訪問的網頁獲取數據。一堆不同的例子,但沒有一個正在解決這個問題。這是我到目前爲止有:從需要登錄的頁面中抓取數據
from bs4 import BeautifulSoup
import urllib, urllib2, cookielib
username = 'name'
password = 'pass'
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('WebpageWithLoginForm')
resp = opener.open('WebpageIWantToAccess')
soup = BeautifulSoup(resp, 'html.parser')
print soup.prettify()
截至目前當我打印它只是打印頁面的內容,如果我沒有登錄的頁面,我認爲這個問題有什麼做的我設置cookie的方式,但我真的不知道,因爲我不完全理解cookie處理器和它的庫發生了什麼。 謝謝!
目前代碼:
import requests
import sys
EMAIL = 'usr'
PASSWORD = 'pass'
URL = 'https://connect.lehigh.edu/app/login'
def main():
# Start a session so we can have persistant cookies
session = requests.session(config={'verbose': sys.stderr})
# This is the form data that the page sends when logging in
login_data = {
'username': EMAIL,
'password': PASSWORD,
'LOGIN': 'login',
}
# Authenticate
r = session.post(URL, data=login_data)
# Try accessing a page that requires you to be logged in
r = session.get('https://lewisweb.cc.lehigh.edu/PROD/bwskfshd.P_CrseSchdDetl')
if __name__ == '__main__':
main()
[登錄網站使用python]的可能的複製(http://stackoverflow.com/questions/8316818/login-to-website-using-python) – Harrison