大熊貓：結合兩列在數據幀

我有一個熊貓DataFrame，在它有多個列：大熊貓：結合兩列在數據幀

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51 
Data columns: 
foo     11516 non-null values 
bar     228381 non-null values 
Time_UTC    239897 non-null values 
dtstamp    239897 non-null values 
dtypes: float64(4), object(1)

其中foo和bar是包含相同的數據還沒有被命名爲不同的列。是否有辦法將組成foo的行改爲bar，理想情況下，同時保持bar的名稱？

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51 
Data columns: 
bar     239897 non-null values 
Time_UTC    239897 non-null values 
dtstamp    239897 non-null values 
dtypes: float64(4), object(1)

這是NaN值即由棒是由從值替換foo：

在端作爲數據幀應該出現。

來源

2012-06-10 BFTM

試試這個：

pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)

如果你想要的數據，成爲新列bar，只是把結果賦值給df['bar']。

來源

2012-06-10 21:38:40 BrenBarn

我沒有看到'concat'作爲大熊貓命名空間的功能;我不確定我錯過了什麼。 – BFTM

你有什麼版本的熊貓？該功能記錄在這裏：http://pandas.pydata.org/pandas-docs/stable/merging.html#concatenating-objects – BrenBarn

我運行的pandas版本0.6.1沒有包含concat功能。升級到v 0.7.3將concat帶入命名空間。奇蹟般有效！謝謝。 – BFTM

可以直接使用fillna和分配結果到列 '酒吧'

df['bar'].fillna(df['foo'], inplace=True) 
del df['foo']

一般實例：

import pandas as pd 
#creating the table with two missing values 
df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2]) 
df2 = pd.DataFrame({'b':[5,6]}, index = [3,4]) 
dftot = pd.concat((df1, df2)) 
print dftot 
#creating the dataframe to fill the missing values 
filldf = pd.DataFrame({'a':[7,7,7,7]}) 

#filling 
print dftot.fillna(filldf)

來源

2014-05-21 15:38:41 user1883737

但請注意，由於filldf被索引爲0..3，而dftot被索引爲1..4，所以dftot.fillna（filldf）['a'] [4]將爲nan。不是7.0 –

另一種選擇，可使用框架上的.apply()方法。你可以做重新分配柱尊重現有的數據...（因爲至少0.12）

import pandas as pd 
import numpy as np 

# get your data into a dataframe 

# replace content in "bar" with "foo" if "bar" is null 
df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1) 

# note: change 'np.NaN' with null values you have like an empty string

來源

2016-04-28 16:51:04 openwonk

感謝趕上@Veenit – openwonk

更現代的大熊貓版本具有combine_first() and update()方法數據框中和Series對象。例如，如果您的數據框被稱爲df，你會怎麼做：

df.bar.combine_first(df.foo)

，因爲這隻會改變bar列到foo列匹配NaN值，並會這麼做就地。要覆蓋bar中的非Nan值與foo中的值，可以使用update()方法。

來源

2016-11-30 00:57:03 dagrha

您也可以使用numpy來做到這一點。

df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])

來源

2016-12-01 03:51:41 Veenit

大熊貓：結合兩列在數據幀

回答

相關問題