2014-07-18 24 views
0

我試圖從站點獲取查詢的excel文件。當我輸入直接鏈接時,它會導致登錄頁面,一旦我輸入了我的用戶名和密碼,它就會自動下載excel文件。我試圖避免安裝不屬於標準python的附加模塊(該腳本將在「標準化機器」上運行,如果未安裝該模塊,則該模塊將不起作用)從經過身份驗證的站點獲取文件(使用python urllib,urllib2)

我試過以下但我在excel文件中看到了「頁面登錄」信息: - |

import urllib 

url = "myLink_queriedResult/result.xls" 
urllib.urlretrieve(url,"C:\\test.xls") 

SO ..然後我看着用密碼驗證使用urllib2,但然後我卡住了。

我有以下代碼:

import urllib2 
import urllib 

theurl = 'myLink_queriedResult/result.xls' 
username = 'myName' 
password = 'myPassword' 

passman = urllib2.HTTPPasswordMgrWithDefaultRealm() 
passman.add_password(None, theurl, username, password) 

authhandler = urllib2.HTTPBasicAuthHandler(passman) 
opener = urllib2.build_opener(authhandler) 
urllib2.install_opener(opener) 
pagehandle = urllib2.urlopen(theurl) 
pagehandle.read() ##but seems like it still only contain a 'login page' 

提前欣賞任何建議。 :)

回答

1

對於Requests,Urllib現在通常會被拒絕。

這會做你想要什麼:

import requests 
from requests.auth import HTTPBasicAuth 

theurl= 'myLink_queriedResult/result.xls' 
username = 'myUsername' 
password = 'myPassword' 

r=requests.get(theurl, auth=HTTPBasicAuth(username, password)) 

在這裏你可以找到更多的information on authentication using request.

0

您需要使用Cookie來允許驗證。 `

# check the input name for login information by inspecting source 
values ={'username' : username, 'password':password} 
data = urllib.parse.urlencode(values).encode("utf-8") 
cookies = cookielib.CookieJar() 

# create "opener" (OpenerDirector instance) 
    opener = urllib.request.build_opener(
     urllib.request.HTTPRedirectHandler(), 
     urllib.request.HTTPHandler(debuglevel=0), 
     urllib.request.HTTPSHandler(debuglevel=0), 
     urllib.request.HTTPCookieProcessor(cookies)) 

# use the opener to fetch a URL 
    response = opener.open(the_url,data) 

# Install the opener. 
# Now all calls to urllib.request.urlopen use our opener. 
    urllib.request.install_opener(opener)` 
1

您可以嘗試通過這種方式與Python 3,

import requests 
    #import necessary Authentication Method 
    from requests_ntlm import HttpNtlmAuth 
    from xlrd import open_workbook 
    import pandas as pd 
    from io import BytesIO 
    r = requests.get("http://example.website",auth=HttpNtlmAuth('acc','password')) 
    xd = pd.read_excel(BytesIO(r.content)) 

編號:

  1. https://medium.com/ibm-data-science-experience/excel-files-loading-from-object-storage-python-a54a2cbf4609

  2. http://www.python-requests.org/en/latest/user/authentication/#basic-authentication

  3. Pandas read_csv from url
相關問題