0
我的程序設置爲根據狀態和其他變量下載URL的笛卡爾積,將zip文件(從創建的URL)保存到指定位置,檢查zip文件中的數據(一些zip文件無需下載數據下載),寫入特定文件,瞭解狀態數據,然後在狀態完成時寫入文件。這是基於狀態並行完成的,即阿拉巴馬州和阿拉斯加州將平行進行上述操作。不過,我不斷收到以下錯誤:Python並行問題
An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line statement', (179, 0))
發生的錯誤,當我重新開始,即以前沒有運行的過程。如果我部分運行該過程,則不會發生這種情況。更具體地說,它隨機發生。
這裏是我的代碼:
功能 -
def createURL(state, typ, geography, level, data, dictionary):
DATALIST = list(itertools.product(typ, geography, level, data))
TXTLIST = list(itertools.product(typ, dictionary))
DEFLIST = list(itertools.product(typ))
DATALINKS = []
for data in DATALIST:
result = 'URL'
DATALINKS.append(result)
TXTLINKS = []
for txt in TXTLIST:
links = 'URL'
TXTLINKS.append(links)
DEFLINKS = []
for defl in DEFLIST:
definitions = 'URL'
DEFLINKS.append(definitions)
URLLINKS = DATALINKS + TXTLINKS + DEFLINKS
return URLLINKS
def downloadData(state, TYPE, GEOGRAPHY, LEVEL, DATA, \
DICTIONARY, YEAR, QUARTER, completedStates):
print ('Working on state: ', state)
URLLINKS = createURL(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY)
DIRECTORY = '/home/justin/QWI/' + YEAR + 'Q' + QUARTER + '/' + state
if not os.path.exists(DIRECTORY[:-2]):
os.makedirs(DIRECTORY[:-2])
if not os.path.exists(DIRECTORY):
os.makedirs(DIRECTORY)
downLoadedURLs = DIRECTORY[:-2] + 'downLoadedURLs.txt'
if not os.path.isfile(downLoadedURLs):
with open(downLoadedURLs, 'a') as downloaded:
downloaded.write('')
with open(downLoadedURLs) as downloaded:
URLcontent = downloaded.read().splitlines()
URLLINKS = [x for x in URLLINKS if x not in URLcontent]
for url in URLLINKS:
print ('Downloading data: ', url)
save = DIRECTORY + '/' + os.path.basename(url)
urllib.urlretrieve(url, save)
with open(downLoadedURLs, 'a') as downloaded:
downloaded.write('{}\n'.format(url))
if os.stat(save).st_size == 0:
shutil.rmtree(DIRECTORY)
with open(DIRECTORY[:-2] + '/zeroDataStates.txt', 'a') as zeroData:
zeroData.write('{}\n'.format(state))
break
with open(completedStates, 'a') as completedState:
completedState.write('{}\n'.format(state))
這裏是並行代碼:
from joblib import Parallel, delayed
STATE = ['al', 'ak', etc...]
Parallel(n_jobs = CORES)(delayed(downloadData)\
(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER,
completedStates) for state in STATE)
我相信寫入文件或獲取當錯誤發生時任網址。
謝謝你的迴應。然而,這並沒有解決這個問題,因爲我仍然得到了上述錯誤,即4次中的1次。我轉換了將代碼並行化到UNIX命令行的方式,例如,我通過命令行傳遞狀態並從那裏並行運行程序。 –
因此,您可能在文件末尾附近有另一個多行語句。在你的代碼中搜索'''''''' –