前5非數字，非空，不同於一列值

如何獲得的第一個五年非數字，非空，不同於一列的值？前5非數字，非空，不同於一列值

例如，給定表如下

col1 
===== 
n1 
1   
2   
n2 
n3 
n3 
n4 
n5 
n5 
n6 
None

我想

col1 
===== 
n1  
n2 
n3 
n4 
n5

來源

2017-02-20 william007

循環並使用正則表達式？ – sed

您可以使用pd.to_numeric到非楠強制NaN，然後反轉面具和選擇前5個獨特的價值觀：

In [9]: 
df.loc[df.index.difference(pd.to_numeric(df['col1'], errors='coerce').dropna().index),'col1'].unique()[:5] 

Out[9]: 
array(['n1', 'n2', 'n3', 'n4', 'n5'], dtype=object)

來源

2017-02-20 15:47:38 EdChum

您可以使用：

df = pd.DataFrame({'col1':['n1', '1', '2', 'n2', 'n3', 'n3', 'n4', 'n5', 'n5', 'n6','None']})

刪除字符串NaN和None通過replace
刪除數字由to_numeric和boolean indexing
刪除重複的drop_duplicates
如果head
獲得前5個值需要reset_index爲單調遞增索引

df = df.loc[pd.to_numeric(df.col1.replace({'None':1, 'NaN':1}), 
          errors='coerce').isnull(), 'col1'] 
     .drop_duplicates() 
     .head(5) 
     .reset_index(drop=True) 

print (df) 
0 n1 
1 n2 
2 n3 
3 n4 
4 n5 
Name: col1, dtype: object

另一種可能的解決方案：

df = pd.Series(df.loc[pd.to_numeric(df.col1 
         .replace({'None':1, 'NaN':1}), errors='coerce').isnull(), 'col1'] 
     .unique()[:5]) 
print (df) 
0 n1 
1 n2 
2 n3 
3 n4 
4 n5 
dtype: object

但如果混合值 - 數字與strings：

df = pd.DataFrame({'col1':['n1', 1, 1, 'n2', 'n3', 'n3', 'n4', 'n5', 'n5', 'n6', None]}) 

df = pd.Series(df.loc[df.col1.apply(lambda x: isinstance(x, str)), 'col1'] 
     .unique()[:5]) 

print (df) 
0 n1 
1 n2 
2 n3 
3 n4 
4 n5 
dtype: object

來源

2017-02-20 15:47:47 jezrael

這兩個答案都是正確和好的，但我覺得這在我的腦海中更容易走過。 –

@Corley Brigman謝謝。 – jezrael

前5非數字，非空，不同於一列值

回答

相關問題