2017-02-12 42 views
1

我是python的新手。我的問題隱藏了一下。如果單元格中的任何字符串與特定的通配符規則相匹配,我想從dataFrame中選擇一行。讓我們假設這個例子:熊貓數據框 - 選擇具有WildCards的行

表到屏幕:

df=pd.DataFrame({'Column':[ 
    'select rows in pandas DataFrame using comparisons against two columns', 
    'select rows from a DataFrame based on values in a column in pandas', 
    'use a list of values to select rows from a pandas dataframe', 
    'selecting columns from a pandas dataframe based on row conditions', 
    'select particular columns from inside groups in pandas dataframe']}) 

    Column 
0 select rows in pandas DataFrame using comparisons against two columns 
1 select rows from a DataFrame based on values in a column in pandas 
2 use a list of values to select rows from a pandas dataframe 
3 selecting columns from a pandas dataframe based on row conditions 
4 select particular columns from inside groups in pandas dataframe 

規則:

Rules=pd.DataFrame({'SearchTerms':['*select*DataFrame*row*','*select*dataframe*row*']}) 

    SearchTerms 
0 *select*DataFrame*row* 
1 *select*dataframe*row* 

結果:

Column 
0 select rows in pandas DataFrame using comparisons against two columns 
1 select rows from a DataFrame based on values in a column in pandas 
2 use a list of values to select rows from a pandas dataframe 

我試圖用多個語句一樣,使用的fnmatch:

import fnmatch 
selection=[] 
for row in df['Column']: 
    selection.append(fnmatch.fnmatch(row,Rules[0])|fnmatch.fnmatch(row,Rules[1])) 

問題

如何選擇行從數據框中使用通配符聲明的變量是多少?

生活布萊恩無處可去。來人幫幫我!!! )

由於提前,

+0

提供數據幀示例? – Chuck

+0

當然@CharlesMorris –

+0

所以,你想搜索你的第一個數據框,尋找匹配規則數據框中包含的兩個字符串條件的匹配?規則數據框中需要搜索什麼詞?即它是'DataFrame'或'row'或'rows'還是'Dataframe'?你的功能是否有效? – Chuck

回答

1

解 「通配符」:

數據:

In [53]: df 
Out[53]: 
                    Column 
0 select rows in pandas DataFrame using comparisons against two columns 
1  select rows from a DataFrame based on values in a column in pandas 
2   use a list of values to select rows from a pandas dataframe 
3  selecting columns from a pandas dataframe based on row conditions 
4  select particular columns from inside groups in pandas dataframe 

In [54]: Rules 
Out[54]: 
       SearchTerms 
0 *select*DataFrame*row* 
1 *select*dataframe*row* 

解決方案:

In [55]: pat = Rules.SearchTerms.str.replace('\*', r'.*').str.cat(sep='|') 

In [56]: df[df.Column.str.contains(pat, flags=re.I)] 
Out[56]: 
                   Column 
3 selecting columns from a pandas dataframe based on row conditions 

生成正則表達式模式:

In [64]: pat 
Out[64]: '.*select.*DataFrame.*row.*|.*select.*dataframe.*row.*' 
+0

好吧,大熊貓的數據框是必要使用正則表達式?不可能使用如下語句:如果與('dataframe' ** AND **'row' ** AND **'select')匹配** ** OR **('DataFrame' ** AND **'row' **和**'select') –

+0

@ B.Gees,這是可能的,但它有__nothing__做通配符... – MaxU

+0

請幫我anderstand更多有關,我很有趣,:) –

0

我想你可能會在熊貓中使用內置的字符串匹配函數獲得更好的成功。如果您有一個pandas Series對象(一個DataFrame列是一個Series對象),它是一個字符串集合,您可以撥打.str.<method>。有很多可用的字符串方法,但在這種情況下,您可以使用.str.match(...).str.contains(...)

這兩種方法都接受正則表達式語句。這意味着將您的通配符表達式更改爲regEx。

df[df.Column.str.match('select|DataFrame|row', case=False)] 

              Column 
0 select rows in pandas DataFrame using comparis... 
1 select rows from a DataFrame based on values i... 
3 selecting columns from a pandas dataframe base... 
4 select particular columns from inside groups i... 
+0

Hello @James。很好,但在這個解決方案中,不可能應用**和**聲明? –

+0

感謝James爲您提供的解決方案 –