2017-07-26 197 views
0

我知道,丟棄數據幀的列應該是一樣容易:熊貓 - 刪除列

df.drop(df.columns[1], axis=1)通過指數

dr.dropna(axis=1, how='any')砸基礎上下降,如果它包含NaN秒。

但是這些都不適用於我的數據框,我不確定這是因爲格式問題還是數據類型問題,或者是對這些命令的誤用或誤解。

這裏是我的數據框:

fish_frame after append new_column:       0  1  2  3       4 \ 
2     GBE COD  NaN  NaN 600      NaN 
3     GBW COD  NaN 11,189 NaN      NaN 
4     GOM COD  NaN  0 NaN Package Deal - $40,753.69 
5     POLLOCK  NaN  NaN 1,103      NaN 
6     WHAKE  NaN  NaN  12      NaN 
7    GBE HADDOCK  NaN 10,730 NaN      NaN 
8    GBW HADDOCK  NaN 64,147 NaN      NaN 
9    GOM HADDOCK  NaN  0 NaN      NaN 
10    REDFISH  NaN  NaN  0      NaN 
11   WITCH FLOUNDER  NaN  370 NaN      NaN 
12     PLAICE  NaN  NaN 622      NaN 
13  GB WINTER FLOUNDER 54,315  NaN NaN      NaN 
14 GOM WINTER FLOUNDER  653  NaN NaN      NaN 
15 SNEMA WINTER FLOUNDER 14,601  NaN NaN      NaN 
16   GB YELLOWTAIL  NaN 1,663 NaN      NaN 
17  SNEMA YELLOWTAIL  NaN 1,370 NaN      NaN 
18  CCGOM YELLOWTAIL 1,812  NaN NaN      NaN 

     6  package_deal_column Package_Price new_column 
2 NaN Package Deal - $40,753.69   None  600 
3 NaN Package Deal - $40,753.69   None 11,1890 
4 None Package Deal - $40,753.69   None   0 
5 NaN Package Deal - $40,753.69   None  1,103 
6 NaN Package Deal - $40,753.69   None   12 
7 NaN Package Deal - $40,753.69   None 10,7300 
8 NaN Package Deal - $40,753.69   None 64,1470 
9 NaN Package Deal - $40,753.69   None   0 
10 NaN Package Deal - $40,753.69   None   0 
11 NaN Package Deal - $40,753.69   None  3700 
12 NaN Package Deal - $40,753.69   None  622 
13 None Package Deal - $40,753.69   None 54,31500 
14 None Package Deal - $40,753.69   None  65300 
15 None Package Deal - $40,753.69   None 14,60100 
16 NaN Package Deal - $40,753.69   None  1,6630 
17 NaN Package Deal - $40,753.69   None  1,3700 
18 None Package Deal - $40,753.69   None 1,81200 

然後,我有以下幾行代碼:

fish_frame.drop(fish_frame.columns[1], axis=1) 
fish_frame.drop(fish_frame.columns[2], axis=1) 
fish_frame.drop(fish_frame.columns[3], axis=1) 
fish_frame.drop(fish_frame.columns[4:5], axis=1) 
#del fish_frame[4:5] #doesn't work, "TypeError: slice(4, 5, None) is an invalid key" 
del fish_frame['Package_Price'] 
fish_frame.dropna(axis=1, how='any') 

然後我再打印輸出數據幀和它出來爲:

NEW fish_frame:       0  1  2  3       4 \ 
2     GBE COD  NaN  NaN 600      NaN 
3     GBW COD  NaN 11,189 NaN      NaN 
4     GOM COD  NaN  0 NaN Package Deal - $40,753.69 
5     POLLOCK  NaN  NaN 1,103      NaN 
6     WHAKE  NaN  NaN  12      NaN 
7    GBE HADDOCK  NaN 10,730 NaN      NaN 
8    GBW HADDOCK  NaN 64,147 NaN      NaN 
9    GOM HADDOCK  NaN  0 NaN      NaN 
10    REDFISH  NaN  NaN  0      NaN 
11   WITCH FLOUNDER  NaN  370 NaN      NaN 
12     PLAICE  NaN  NaN 622      NaN 
13  GB WINTER FLOUNDER 54,315  NaN NaN      NaN 
14 GOM WINTER FLOUNDER  653  NaN NaN      NaN 
15 SNEMA WINTER FLOUNDER 14,601  NaN NaN      NaN 
16   GB YELLOWTAIL  NaN 1,663 NaN      NaN 
17  SNEMA YELLOWTAIL  NaN 1,370 NaN      NaN 
18  CCGOM YELLOWTAIL 1,812  NaN NaN      NaN 

     6  package_deal_column new_column 
2 NaN Package Deal - $40,753.69  600 
3 NaN Package Deal - $40,753.69 11,1890 
4 None Package Deal - $40,753.69   0 
5 NaN Package Deal - $40,753.69  1,103 
6 NaN Package Deal - $40,753.69   12 
7 NaN Package Deal - $40,753.69 10,7300 
8 NaN Package Deal - $40,753.69 64,1470 
9 NaN Package Deal - $40,753.69   0 
10 NaN Package Deal - $40,753.69   0 
11 NaN Package Deal - $40,753.69  3700 
12 NaN Package Deal - $40,753.69  622 
13 None Package Deal - $40,753.69 54,31500 
14 None Package Deal - $40,753.69  65300 
15 None Package Deal - $40,753.69 14,60100 
16 NaN Package Deal - $40,753.69  1,6630 
17 NaN Package Deal - $40,753.69  1,3700 
18 None Package Deal - $40,753.69 1,81200 

既沒有NaN下降工作也沒有指數下降工作。只有特定的drop[column name]命令有效,但我無法爲此腳本的每次迭代都做到這一點。

我很困惑,我希望這不是一個非常愚蠢的錯誤,我正在做。

而且,我自己也不能完全理解這些信息,但是打印fish_frame.info()生產:

<class 'pandas.core.frame.DataFrame'> 
RangeIndex: 17 entries, 2 to 18 
Data columns (total 8 columns): 
0      17 non-null object 
1      4 non-null object 
2      8 non-null object 
3      5 non-null object 
4      1 non-null object 
6      0 non-null object 
package_deal_column 17 non-null object 
new_column    17 non-null object 
dtypes: object(8) 
memory usage: 586.0+ bytes 

解決這個任何幫助,將不勝感激感謝。

+0

你需要的地方掉落或結果重新分配到一個新的DF 。 – MrE

回答

1

如果沒有錯誤,我沒有看到你從一個輸出,你只是忘記使用inplace參數:

df.drop(df.columns[1], axis=1, inplace=True) 
+0

嗯。好的,但只有輕微的工作。 Idk如果這是在我的最後,但我改變了我的代碼了一下,只是'fish_frame.drop(fish_frame.columns [1],axis = 1,inplace = True)','fish_frame.drop(fish_frame.columns [2 ],axis = 1,inplace = True)'和'fish_frame.drop(fish_frame.columns [3],axis = 1,inplace = True)'刪除第2,3和4列。但是它刪除了第2列, 4和6 ... – theprowler

+0

要確保刪除正確的列,請使用實際的列名稱:'fish_frame。drop('name of column 1',axis = 1,inplace = True)' –

+0

但是,當我的列沒有名稱並不是通過索引刪除它們的下一個最佳方式? – theprowler

2

這裏有一些選擇:

設置:

df = pd.DataFrame(np.random.rand(3,5), columns=list('abcde')) 

In [57]: cols_to_drop = ['b', 'd'] 

In [63]: df 
Out[63]: 
      a   b   c   d   e 
0 0.758670 0.734007 0.027711 0.614674 0.955711 
1 0.833110 0.242010 0.922831 0.165401 0.546079 
2 0.414916 0.949050 0.608527 0.018036 0.230343 

選項1:

df = df[df.columns.drop(col_to_drop)] 

選項2:

df = df[df.columns.difference(cols_to_drop)] 

方案3:

df = df.loc[:, ~df.columns.isin(cols_to_drop)] 

所有回報:

  a   c   e 
0 0.758670 0.027711 0.955711 
1 0.833110 0.922831 0.546079 
2 0.414916 0.608527 0.230343