2015-05-05 65 views
0

我的程序設置爲根據狀態和其他變量下載URL的笛卡爾積,將zip文件(從創建的URL)保存到指定位置,檢查zip文件中的數據(一些zip文件無需下載數據下載),寫入特定文件,瞭解狀態數據,然後在狀態完成時寫入文件。這是基於狀態並行完成的,即阿拉巴馬州和阿拉斯加州將平行進行上述操作。不過,我不斷收到以下錯誤:Python並行問題

An unexpected error occurred while tokenizing input 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (179, 0)) 

發生的錯誤,當我重新開始,即以前沒有運行的過程。如果我部分運行該過程,則不會發生這種情況。更具體地說,它隨機發生。

這裏是我的代碼:

功能 -

def createURL(state, typ, geography, level, data, dictionary): 

    DATALIST = list(itertools.product(typ, geography, level, data)) 
    TXTLIST  = list(itertools.product(typ, dictionary)) 
    DEFLIST  = list(itertools.product(typ)) 

    DATALINKS = [] 
    for data in DATALIST: 
     result = 'URL' 

    DATALINKS.append(result) 

    TXTLINKS = [] 
    for txt in TXTLIST: 
      links = 'URL' 
    TXTLINKS.append(links) 


    DEFLINKS = [] 
    for defl in DEFLIST: 
     definitions = 'URL' 

    DEFLINKS.append(definitions) 

     URLLINKS = DATALINKS + TXTLINKS + DEFLINKS 
     return URLLINKS 


def downloadData(state, TYPE, GEOGRAPHY, LEVEL, DATA, \ 
      DICTIONARY, YEAR, QUARTER, completedStates): 
    print ('Working on state: ', state)  

    URLLINKS = createURL(state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY) 

    DIRECTORY = '/home/justin/QWI/' + YEAR + 'Q' + QUARTER + '/' + state 
    if not os.path.exists(DIRECTORY[:-2]): 
     os.makedirs(DIRECTORY[:-2]) 

    if not os.path.exists(DIRECTORY): 
     os.makedirs(DIRECTORY) 

    downLoadedURLs = DIRECTORY[:-2] + 'downLoadedURLs.txt' 
    if not os.path.isfile(downLoadedURLs): 
     with open(downLoadedURLs, 'a') as downloaded: 
      downloaded.write('') 


    with open(downLoadedURLs) as downloaded: 
     URLcontent = downloaded.read().splitlines() 


    URLLINKS = [x for x in URLLINKS if x not in URLcontent] 

    for url in URLLINKS: 
     print ('Downloading data: ', url) 
     save = DIRECTORY + '/' + os.path.basename(url) 

     urllib.urlretrieve(url, save) 
     with open(downLoadedURLs, 'a') as downloaded: 
      downloaded.write('{}\n'.format(url)) 

     if os.stat(save).st_size == 0: 
      shutil.rmtree(DIRECTORY) 
      with open(DIRECTORY[:-2] + '/zeroDataStates.txt', 'a') as zeroData: 
      zeroData.write('{}\n'.format(state)) 
     break 

    with open(completedStates, 'a') as completedState: 
     completedState.write('{}\n'.format(state)) 

這裏是並行代碼:

from joblib import Parallel, delayed 

STATE = ['al', 'ak', etc...] 

Parallel(n_jobs = CORES)(delayed(downloadData)\ 
    (state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER, 
    completedStates) for state in STATE) 

我相信寫入文件或獲取當錯誤發生時任網址。

回答

0
'EOF in multi-line statement' 

Python多行語句是以\結尾的語句。 EOF表示文件的結尾。所以你正在尋找一個在文件結束之前沒有完成的多行語句。你舉的例子代碼包含正是在這個片段的第一行:

Parallel(n_jobs = CORES)(delayed(downloadData)\ 
    (state, TYPE, GEOGRAPHY, LEVEL, DATA, DICTIONARY, YEAR, QUARTER, 
    completedStates) for state in STATE) 

它看起來像括號將跨行明確解析,所以你應該能夠只是刪除流氓\。你可能想要檢查你的格式。格式不提供關於代碼結構的任何線索。

+0

謝謝你的迴應。然而,這並沒有解決這個問題,因爲我仍然得到了上述錯誤,即4次中的1次。我轉換了將代碼並行化到UNIX命令行的方式,例如,我通過命令行傳遞狀態並從那裏並行運行程序。 –

+0

因此,您可能在文件末尾附近有另一個多行語句。在你的代碼中搜索'''''''' –