熊貓元素明智的比較和創建選擇

在一個數據框中，我想比較列的元素與值，並將通過比較的元素排序到一個新的列。熊貓元素明智的比較和創建選擇

df = pandas.DataFrame([{'A':3,'B':10}, 
         {'A':2, 'B':30}, 
         {'A':1,'B':20}, 
         {'A':2,'B':15}, 
         {'A':2,'B':100}]) 

df['C'] = [x for x in df['B'] if x > 18]

我無法找出什麼過錯，爲什麼我得到：

ValueError: Length of values does not match length of index

來源

2016-05-24 mati

正如達倫所說，DataFrame中的所有列應具有相同的長度。

當您嘗試print [x for x in df['B'] if x > 18]時，您只能得到[30, 20, 100]值。但是你有五個索引/行。這就是你得到Length of values does not match length of index錯誤的原因。

如下您可以更改代碼：

df['C'] = [x if x > 18 else None for x in df['B']] 
print df

您將獲得：

A B  C 
0 3 10 NaN 
1 2 30 30.0 
2 1 20 20.0 
3 2 15 NaN 
4 2 100 100.0

來源

2016-05-24 07:27:53

我認爲你可以使用loc與boolean indexing：

print (df) 
    A B 
0 3 10 
1 2 30 
2 1 20 
3 2 15 
4 2 100 

print (df['B'] > 18) 
0 False 
1  True 
2  True 
3 False 
4  True 
Name: B, dtype: bool 

df.loc[df['B'] > 18, 'C'] = df['B'] 
print (df) 
    A B  C 
0 3 10 NaN 
1 2 30 30.0 
2 1 20 20.0 
3 2 15 NaN 
4 2 100 100.0

如果你需要通過病症使用的選擇boolean indexing：

print (df[df['B'] > 18]) 
    A B 
1 2 30 
2 1 20 
4 2 100

如果需要更多的東西更快，可以用where：

df['C'] = df.B.where(df['B'] > 18)

時序（len(df)=50k）：

In [1367]: %timeit (a(df)) 
The slowest run took 8.34 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 1.14 ms per loop 

In [1368]: %timeit (b(df1)) 
100 loops, best of 3: 15.5 ms per loop 

In [1369]: %timeit (c(df2)) 
100 loops, best of 3: 2.93 ms per loop

代碼時序：

import pandas as pd 

df = pd.DataFrame([{'A':3,'B':10}, 
         {'A':2, 'B':30}, 
         {'A':1,'B':20}, 
         {'A':2,'B':15}, 
         {'A':2,'B':100}]) 
print (df) 
df = pd.concat([df]*10000).reset_index(drop=True) 
df1 = df.copy() 
df2 = df.copy() 

def a(df): 
    df['C'] = df.B.where(df['B'] > 18) 
    return df 

def b(df1):  
    df['C'] = ([x if x > 18 else None for x in df['B']]) 
    return df 

def c(df2):  
    df.loc[df['B'] > 18, 'C'] = df['B'] 
    return df 

print (a(df)) 
print (b(df1)) 
print (c(df2))

來源

2016-05-24 07:10:35 jezrael

我添加新的更快的方法，請檢查一下。謝謝。 – jezrael

所有列在DataFrame必須是相同的長度H。因爲你過濾出一些值，你試圖插入值減少到C柱比在列A和B.

所以，你的兩個選項來啓動一個新的數據幀爲C：

dfC = [x for x in df['B'] if x > 18]

或者當x不是18+時列中的某個虛擬值。例如： -

df['C'] = np.where(df['B'] > 18, True, False)

甚至：

df['C'] = np.where(df['B'] > 18, 'Yay', 'Nay')

附：另請參閱：Pandas conditional creation of a series/dataframe column以獲取其他方法。

來源

2016-05-24 07:10:57

熊貓元素明智的比較和創建選擇

回答

相關問題