將數據幀分成系列創建NA的

我已經下載數據框，並試圖從該數據幀將數據幀分成系列創建NA的

data = pd.read_csv(filepath_or_buffer = "train.csv", index_col = 0) 
data.columns 

Index([u'qid1',u'qid2',u'question1',u'question2'], dtype = 'object')

這裏是在數據幀列上創建pd.Series，qid1是question1和qid2 ID是ID爲question2 此外，還有在我的數據幀不Nan：

data.question1.isnull().sum() 
0

我要創建的第一個問題pandas.Series（）與qid1爲指標：

question1 = pd.Series(data.question1, index = data.qid1) 
question1.isnull.sum() 
68416

而現在，我的系列中有68416個空值。我的錯誤在哪裏？

來源

2017-04-03 Slavka

通匿名值，所以Series構造函數不會嘗試對齊：

question1 = pd.Series(data.question1.values, index = data.qid1)

這裏的問題是，question1列有它自己的指標，所以它會嘗試在施工過程中使用此

例子：

In [12]: 
df = pd.DataFrame({'a':np.arange(5), 'b':list('abcde')}) 
df 

Out[12]: 
    a b 
0 0 a 
1 1 b 
2 2 c 
3 3 d 
4 4 e 

In [13]: 
s = pd.Series(df['a'], index = df['b']) 
s 

Out[13]: 
b 
a NaN 
b NaN 
c NaN 
d NaN 
e NaN 
Name: a, dtype: float64 

In [14]: 
s = pd.Series(df['a'].values, index = df['b']) 
s 

Out[14]: 
b 
a 0 
b 1 
c 2 
d 3 
e 4 
dtype: int32

有效這裏所發生的是，你在新的IND傳遞重建索引你的現有列ex，因爲沒有與您匹配的索引值NaN

來源

2017-04-03 14:15:53 EdChum

將數據幀分成系列創建NA的

回答

相關問題