提取獨特的精確串上的數據幀列匹配

說我有（數據與多列微小的子集）提取獨特的精確串上的數據幀列匹配

import pandas as pd 
import numpy as np 
df = pd.DataFrame({'A (quarterly) 2010': np.random.rand(3), 
        'A (quarterly) 2011': np.random.rand(3), 
        'B (quarterly) 2010': np.random.rand(3), 
        'B (quarterly) 2011': np.random.rand(3), 
        'X' : np.random.randint(3, size=3)}) 

#Out[11]: 
# A (quarterly) 2010 A (quarterly) 2011 B (quarterly) 2010 \ 
#0   0.868228   0.300513   0.658819 
#1   0.383907   0.496740   0.347421 
#2   0.284787   0.795499   0.856398 

# B (quarterly) 2011 X 
#0   0.374479 1 
#1   0.812860 0 
#2   0.604731 2

我想提取匹配特定的模式，F列名的唯一匹配。例如[A-B] \(.*\)\s。

我能做到這一點，但它看起來很毛毛：

stubs = set([match[0] for match in df.columns.str.findall('[A-B] \(.*\) ').values if match != [] ]) 

list(stubs) 
#['B (quarterly) ', 'A (quarterly) ']

有沒有一種簡單的方法來做到這一點？

來源

2016-12-04 luffe

這裏的另一種方式，還是有點毛茸茸的，但多一點點優雅：

def match(x): 
    m = re.findall(r'[A-B] \(.*\)\s',x) 
    return m[0] if m else None 

[stub for stub in df.columns.to_series().apply(match).unique() if stub] 
# ['A (quarterly) ', 'B (quarterly) ']

來源

2016-12-05 01:15:46 DyZ

提取獨特的精確串上的數據幀列匹配

回答

相關問題