從Kaggle下載BadZipFile

我試圖通過python腳本（Python 3.5）下載和解壓縮Kaggle數據集，但出現錯誤。從Kaggle下載BadZipFile

import io 
from zipfile import ZipFile 
import csv 
import urllib.request 

url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip' 
response = urllib.request.urlopen(url) 
c=ZipFile(io.BytesIO(response.read()))

運行此代碼後，出現以下錯誤。

BadZipFile：文件不是一個zip文件

我怎樣才能擺脫這種錯誤的？原因是什麼？

來源

2017-04-02 Дмитрий Карпов

這個環節是不能公開訪問。你在哪裏包括程序中的訪問權限？這是完整的程序嗎？ – kmario23

像這樣的東西應該可以工作：http://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/ – kmario23

kmario23是對的，你應該在下載文件之前通過python代碼登錄到網站。否則，對URL的請求將不是一個zip文件。 –

使用請求模塊和一些小的修復程序http://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/解決辦法是：

import io 
from zipfile import ZipFile 
import csv 
import requests 

# The direct link to the Kaggle data set 
data_url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip' 

# The local path where the data set is saved. 
local_filename = "test.csv.zip" 

# Kaggle Username and Password 
kaggle_info = {'UserName': "my_username", 'Password': "my_password"} 

# Attempts to download the CSV file. Gets rejected because we are not logged in. 
r = requests.get(data_url) 

# Login to Kaggle and retrieve the data. 
r = requests.post(r.url, data = kaggle_info) 

# Writes the data to a local file one chunk at a time. 
f = open(local_filename, 'wb') 
for chunk in r.iter_content(chunk_size = 512 * 1024): # Reads 512KB at a time into memory 
    if chunk: # filter out keep-alive new chunks 
     f.write(chunk) 
f.close() 

c = ZipFile(local_filename)

來源

2017-04-02 16:53:16

從Kaggle下載BadZipFile

回答

相關問題