大熊貓適用於螺紋

我有DF 1000行的樣本第一次嘗試，我從Excel，它看起來像，上面寫着：大熊貓適用於螺紋

exFcode 
0 38907030 
1 47036870 
2 54060696 
3 38907039 
4 100811680 
(...)

我需要分配的每個碼數篇。爲此，我連接到一個採用每個代碼的API（此API僅允許每個請求一個代碼）並在df的第二列中返回值。目前我這樣做：

def getArticles(code): 
    r = requests.get(API_link % code).content 
    jsonized = json.loads(r.decode("utf-8")) 
    try: 
     num_articles = jsonized["TotalRecords"] 
    except: 
     return 'not found' 
    return num_articles 

df['articles'] = df["exFcode"].apply(lambda row: getArticles(row))

它做的工作，但它很慢，逐一執行每個操作。對於1000個碼，大約需要10分鐘。很多時候我必須處理50k以上的文件...

我在想如何更有效地做到這一點。我想我可以將df分成兩部分，然後在單獨的線程中執行每個部分。這是我第一次嘗試將線程應用到我的程序中...所以我創建了兩個附加函數wrapper和main。

def wrapper(df): 
    df['articles'] = df["exFcode"].apply(lambda row: getArticles(row)) 
    return df 

def main(df): 
    #separate df to two even halves 
    half = int(len(df)) 
    df1 = df.iloc[:half] 
    df2 = df.iloc[half:] 
    t1 = Thread(target=wrapper, args=(df1,)) 
    t2 = Thread(target=wrapper, args=(df2,)) 
    t1.start() 
    t2.start() 
    print('completed')

然而，當我執行功能main(df)沒有任何反應。我完全誤解了線程的概念嗎？任何其他想法如何使它更有效率？

來源

2016-12-21 pawelty

當線程開始時，您打印「完成」。但你錯過了join部分等待他們完成。

t1.start() 
t2.start() 
print('threads started') 
t1.join() 
t2.join() 
print('really completed')

來源

2016-12-21 13:39:34

好點！謝謝。但是，我收到了一些與請求包相關的例外。這必須是API的限制，儘管... – pawelty

大熊貓適用於螺紋

回答

相關問題