2016-09-27 58 views
1

我有一個df列包含各種鏈接,其中一些包含字符串"search"熊貓並應用函數來匹配字符串

我想創建一個函數 - 應用於列 - 返回包含"search""other"的列。

我寫這樣一個功能:

search = 'search' 
def page_type(x): 
if x.str.contains(search): 
    return 'Search' 
else: 
    return 'Other' 

df['link'].apply(page_type) 

,但它給了我一個錯誤,如:

AttributeError: 'unicode' object has no attribute 'str'

我想調用str.contains當我失去了一些東西()。

回答

1

我想你需要numpy.where

df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']}) 

print (df) 
     link 
0  search 
1 homepage d 
2 login dd 
3 profile t 
4   ff 
search = 'search' 
profile = 'profile' 
homepage = 'homepage' 
login = "login" 

def page_type(x): 
    if search in x: 
     return 'Search' 
    elif profile in x: 
     return 'Profile' 
    elif homepage in x: 
     return 'Homepage' 
    elif login in x: 
     return 'Login' 
    else: 
     return 'Other' 

df['link_new'] = df['link'].apply(page_type) 

df['link_type'] = np.where(df.link.str.contains(search),'Search', 
        np.where(df.link.str.contains(profile),'Profile', 
        np.where(df.link.str.contains(homepage), 'Homepage', 
        np.where(df.link.str.contains(login),'Login','Other')))) 


print (df) 
     link link_new link_type 
0  search Search Search 
1 homepage d Homepage Homepage 
2 login dd  Login  Login 
3 profile t Profile Profile 
4   ff  Other  Other 

時序

#[5000 rows x 1 columns] 
df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']}) 
df = pd.concat([df]*1000).reset_index(drop=True) 

In [346]: %timeit df['link'].apply(page_type) 
1000 loops, best of 3: 1.72 ms per loop 

In [347]: %timeit np.where(df.link.str.contains(search),'Search', np.where(df.link.str.contains(profile),'Profile', np.where(df.link.str.contains(homepage), 'Homepage', np.where(df.link.str.contains(login),'Login','Other')))) 
100 loops, best of 3: 11.7 ms per loop 
+0

我多個條件添加溶液,'apply'溶液是更快'np.where' 。 – jezrael

1

.str適用於整個系列,但在這裏,你正在處理的系列裏面的值。

你可以這樣做:df['link'].str.contains(search)
或者像你想:df['link'].apply(lambda x: 'Search' if search in x else 'Other')

編輯

更通用的方法:

def my_filter(x, val, c_1, c_2): 
    return c_1 if val in x else c_2 

df['link'].apply(lambda x: my_filter(x, 'homepage', 'homepage', 'other')) 
+0

如果我想指定一個elif條件:如果主頁然後「主頁否則其他'? – xxxvinxxx

+0

解決它像: df ['link_type'] = np.where(df.referrer.str.contains(search), 'Search', np.where(df.referrer.str.contains(profile),'Profile', np.where(df.referrer.str.contains(homepage),'Homepage', np.where(df .referrer.str.contains(登錄),'登錄','其他')))) – xxxvinxxx

+0

我用你的例子編輯 – Orelus

0

你也可以使用一個list comprehesion,如果你想在鏈接中找到單詞搜索:

佛例如:

df['Search'] = [('search' if 'search' in item else 'other') for item in df['link']] 

輸出:

ColumnA      link Search 
0  a  http://word/12/word other 
1  b  https://search-125.php search 
2  c  http://news-8282.html other 
3  d http://search-hello-1.html search 

創建函數:

def page_type(x, y): 
    df[x] = [('search' if 'search' in item else 'other') for item in df[y]] 

page_type('Search', 'link') 

In [6]: df 
Out[6]: 
    ColumnA      link Search 
0  a   http://word/12/word other 
1  b  https://search-125.php search 
2  c  http://news-8282.html other 
3  d http://search-hello-1.html search