2014-12-26 62 views
1

嗨我有一個只有1列的Excel表,我想將該列導入到Python列表中。 該列有5個元素,全部包含像「http://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0」這樣的網址。在Python列表中導入excel列

我的代碼

import requests 
import csv 
import xlrd 

ls = [] 
ls1 = ['01.jpg','02.jpg','03.jpg','04.jpg','05.jpg','06.jpg'] 
wb = xlrd.open_workbook('Book1.xls') 
ws = wb.sheet_by_name('Book1') 
num_rows = ws.nrows - 1 
curr_row = -1 
while (curr_row < num_rows): 
    curr_row += 1 
    row = ws.row(curr_row) 
    ls.append(row) 

for each in ls: 
    urlFetch = requests.get(each) 
    img = urlFetch.content 
    for x in ls1: 
     file = open(x,'wb') 
     file.write(img) 
     file.close() 

現在,它給我的錯誤:

Traceback (most recent call last): 
    File  "C:\Users\Prime\Documents\NetBeansProjects\Python_File_Retrieve\src\python_file_retrieve.py", line 18, in <module> 
urlFetch = requests.get(each) 
    File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 65, in get 
return request('get', url, **kwargs) 
    File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 49, in request 
response = session.request(method=method, url=url, **kwargs) 
    File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 461, in request 
    resp = self.send(prep, **send_kwargs) 
    File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 567, in send 
    adapter = self.get_adapter(url=request.url) 
    File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 646, in get_adapter 
    raise InvalidSchema("No connection adapters were found for '%s'" % url) 
requests.exceptions.InvalidSchema: No connection adapters were found for '[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']' 

請幫助

回答

1

您的問題不在於閱讀Excel文件,而是從內容中解析出內容。請注意,您的錯誤是從請求庫中拋出的?

requests.exceptions.InvalidSchema: No connection adapters were found for <url> 

從我們瞭解到,您從您的Excel文件中的每個細胞取URL,也有一個[text:前綴的錯誤 -

'[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']' 

這件事情,請求不能工作,因爲它不」不知道URL的協議。 如果你這樣做

requests.get('https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0') 

你會得到合適的結果。

您需要做的是僅將網址提取出單元格。 如果您遇到的問題,給我們的例子在Excel中的URL文件

+0

的網址是:https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg? DL = 0 https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAAlyTk8mD1m8zAy2vtvZkfFa/NT52-178/DPS_0199.jpg?dl=0 https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAAdy87qH3QQlyxk8JyDKIRAa/NT52- 179/DPS_0268.jpg?dl = 0 https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AACV9ojU0aG8lbQ-GdmQuOp1a/NT52-180/DPS_0299.jpg?dl=0 –

+0

問題在於它們保存的方式在你的Excel中,而不是與URL本身。他們不會單獨保存爲網址,但他們有額外的括號和撇號和文本給他們(有「[文本」前綴...)。您必須刪除所有這些才能正確使用該URL。 – thomas

0

在電子表格的網址,點擊其中一個,看看會出現在公式欄。我猜它看起來是這樣的:

[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0'] 

因爲在堆棧跟蹤,這就是它打印出的網址。

您可以刪除括號,引號和「text:」部分嗎?這應該解決它。