dropna（）麻煩與標籤

我試圖平均一組數據在熊貓。數據來自csv文件。我有一個叫做'曲目'的系列。在早期階段，我使用方法dropna()來刪除讀取csv文件時導入的一些空白行。dropna（）麻煩與標籤

我使用的方法，我想平均超過5行的列。我不能使用rolling_mean方法，因爲我想要使用當前值之前的兩行，當前值和當前值之後的兩行來獲取平均值。

當我遇到NaN數據已被刪除的數據時，我遇到了問題，因爲標籤也一樣。

def get_data(filename): 
    '''function to read the data form the input csv file to use in the analysis''' 
    with open(filename, 'r') as f: 
     reader = pd.read_csv(f, sep=',', usecols=('candidate',' final track' ,' status'))      
    print reader[0:20]    
    reader=reader.dropna() 
    print reader[0:20] 
    return reader 

def relative_track(nb): 

    length= len(reader) 
    track=current_tracks.loc[:,' final track'] 
    for el in range(2, length): 
     means=pd.stats.moments.rolling_mean(track, 5) 
     print means

這使輸出（註標籤，在15，16中丟失了第二打印）：

   candidate final track status 
0      1   719  * 
1      2   705  * 
2      3   705  * 
3      4   706  * 
4      5   704  * 
5      1   708  * 
6      2   713  * 
7      3   720  * 
8      4   726  * 
9      5   729  * 
10      1   745  * 
11      2   743  * 
12      3   743  * 
13      4   733  * 
14      5   717  * 
15     NaN   NaN  NaN 
16 *** Large track split   NaN  NaN 
17      1   714  * 
18      2   695  * 
19      3   690  * 
    candidate final track status 
0   1   719  * 
1   2   705  * 
2   3   705  * 
3   4   706  * 
4   5   704  * 
5   1   708  * 
6   2   713  * 
7   3   720  * 
8   4   726  * 
9   5   729  * 
10   1   745  * 
11   2   743  * 
12   3   743  * 
13   4   733  * 
14   5   717  * 
17   1   714  * 
18   2   695  * 
19   3   690  * 
20   4   671  * 
21   5   657  *

但是當我嘗試使用第二函數來計算的手段我得到的錯誤：

raise KeyError("stop bound [%s] is not in the [%s]" % (key.stop,self.obj._get_axis_name(axis))) 
KeyError: 'stop bound [15] is not in the [index]'

這是因爲索引15不存在。如果任何人都可以提供幫助，那會很棒。

來源

2013-09-26 Ashleigh Clayton

I cannot use the rolling_mean method as I would like to take the average using the two rows before the current value, the current value and the two rows after the current value.

使用關鍵字參數center=True，朝向的this section of the documentation結束說明。

此外，pd.stats.moments.rolling_mean可以簡單地訪問爲pd.rolling_mean;它是熊貓的頂級功能。

P.S.我想我在這裏理解你的意圖，但是你的代碼可能有一些與你的問題無關的問題。（例如，最後一個for循環中的el計數變量未被使用 - 看起來它只是重複執行相同的操作。）但是，也許center關鍵字無論如何都可以避免大部分現有工作。

來源

2013-09-26 13:09:18

dropna（）麻煩與標籤

回答

相關問題