2017-10-05 116 views
0

我正在網站瀏覽幾個網站的一些數據,我正在使用熊貓來修改它。Python中的熊貓錯誤:列的長度必須與密鑰長度相同

在第一個數據是工作順利,但後來我收到此錯誤信息:`

Traceback(most recent call last): File "data.py", line 394 in <module> df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True) File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2326, in __setitem__ self._setitem_array(key,value) File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2350, in _setitem_array raise ValueError("Columns must be same lenght as key') ValueError: Columns must be same lenght as key 

我的部分代碼是在這裏:

df2 = pd.DataFrame(datatable,columns = cols) 
df2['FLIGHT_ID_1'] = df2['FLIGHT'].str[:3] 
df2['FLIGHT_ID_2'] = df2['FLIGHT'].str[3:].str.zfill(4) 
df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True) 

編輯-jezrael:我用了你代碼,並從此做出了一個打印: 我希望通過這個,我們可以找到問題所在。因爲看起來它是隨機的,當腳本遇到這個分割問題時。

    0   1 
2  Landed 8:33 AM 
3  Landed 9:37 AM 
4  Landed 9:10 AM 
5  Landed 9:57 AM 
6  Landed 9:36 AM 
8  Landed 8:51 AM 
9  Landed 9:18 AM 
11  Landed 8:53 AM 
12  Landed 7:59 AM 
13  Landed 7:52 AM 
14  Landed 8:56 AM 
15  Landed 8:09 AM 
18  Landed 8:42 AM 
19  Landed 9:39 AM 
20  Landed 9:45 AM 
21  Landed 7:44 AM 
23  Landed 8:36 AM 
27  Landed 9:53 AM 
29  Landed 9:26 AM 
30  Landed 8:23 AM 
35  Landed 9:59 AM 
36  Landed 8:38 AM 
37  Landed 9:38 AM 
38  Landed 9:37 AM 
40  Landed 9:27 AM 
43  Landed 9:14 AM 
44  Landed 9:22 AM 
45  Landed 8:18 AM 
46  Landed 10:01 AM 
47  Landed 10:21 AM 
..   ...  ... 
316 Delayed 5:00 PM 
317 Delayed 4:34 PM 
319 Estimated 2:58 PM 
320 Estimated 3:02 PM 
321 Delayed 4:47 PM 
323 Estimated 3:08 PM 
325 Delayed 3:52 PM 
326 Estimated 3:09 PM 
327 Estimated 2:37 PM 
328 Estimated 3:17 PM 
329 Estimated 3:20 PM 
330 Estimated 2:39 PM 
331 Delayed 4:04 PM 
332 Delayed 4:36 PM 
337 Estimated 3:47 PM 
339 Estimated 3:37 PM 
341 Delayed 4:32 PM 
345 Estimated 3:34 PM 
349 Estimated 3:24 PM 
356 Delayed 4:56 PM 
358 Estimated 3:45 PM 
367 Estimated 4:09 PM 
370 Estimated 4:04 PM 
371 Estimated 4:11 PM 
373 Delayed 5:21 PM 
382 Estimated 3:56 PM 
384 Delayed 4:28 PM 
389 Delayed 4:41 PM 
393 Estimated 4:02 PM 
397 Delayed 5:23 PM 

[240 rows x 2 columns] 
+0

您可以添加一些數據樣本嗎? – jezrael

+0

(https://stackoverflow.com/questions/46522269/how-can-i-split-a-column-into-2-in-the-correct-way) (https://stackoverflow.com/questions/ 46524461/how-can-i-split-a-column-into-2-in-the-correct-in-python) – Harley

+0

嗯,真的很有趣。你可以檢查'df3 = df2 ['STATUS']。str.split(n = 1,expand = True)'然後'print(df3 [df3 [df3.columns [-1]]。notnull()]) ?你可以添加輸出到問題嗎? – jezrael

回答

1

你需要一點點修改的解決方案,因爲有時它返回2,有時只有一列:

df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM']}) 


df3 = df2['STATUS'].str.split(n=1, expand=True) 
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns] 
print (df3) 
    STATUS_ID1 STATUS_ID2 
0 Estimated 3:17 PM 
1 Delayed 3:00 PM 

df2 = df2.join(df3) 
print (df2) 
       STATUS STATUS_ID1 STATUS_ID2 
0 Estimated 3:17 PM Estimated 3:17 PM 
1 Delayed 3:00 PM Delayed 3:00 PM 

另一種可能的數據 - 所有數據都沒有空格和解決工作壓力太大:

df2 = pd.DataFrame({'STATUS':['Canceled','Canceled']}) 

和解答回覆:

print (df2) 
    STATUS STATUS_ID1 
0 Canceled Canceled 
1 Canceled Canceled 

全部在一起:

df3 = df2['STATUS'].str.split(n=1, expand=True) 
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns] 
df2 = df2.join(df3) 
+0

我必須在我的代碼中正確插入什麼? 這一個:df2 [['STATUS_ID_1','STATUS_ID_2']] = df2 ['STATUS']。str.split(n = 1,expand = True) df2 = pd.DataFrame({'STATUS'})df3 = df2 ['STATUS']。str.split(n = 1,expand = True) df3.columns = ['STATUS_ID {'。'format(x + 1)for x in df3.columns]? – Harley

+0

我的代碼改爲'df2 [['STATUS_ID_1','STATUS_ID_2']] = df2 ['STATUS']。str.split(n = 1,expand = True)' – jezrael

+0

好吧,我必須寫而不是這[[取消],[取消]]?只刪除它,並使用你的前三行? – Harley

相關問題