熊貓並應用函數來匹配字符串

我有一個df列包含各種鏈接，其中一些包含字符串"search"。熊貓並應用函數來匹配字符串

我想創建一個函數 - 應用於列 - 返回包含"search"或"other"的列。

我寫這樣一個功能：

search = 'search' 
def page_type(x): 
if x.str.contains(search): 
    return 'Search' 
else: 
    return 'Other' 

df['link'].apply(page_type)

，但它給了我一個錯誤，如：

AttributeError: 'unicode' object has no attribute 'str'

我想調用str.contains當我失去了一些東西（）。

來源

2016-09-27 xxxvinxxx

我想你需要numpy.where：

df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']}) 

print (df) 
     link 
0  search 
1 homepage d 
2 login dd 
3 profile t 
4   ff

search = 'search' 
profile = 'profile' 
homepage = 'homepage' 
login = "login" 

def page_type(x): 
    if search in x: 
     return 'Search' 
    elif profile in x: 
     return 'Profile' 
    elif homepage in x: 
     return 'Homepage' 
    elif login in x: 
     return 'Login' 
    else: 
     return 'Other' 

df['link_new'] = df['link'].apply(page_type) 

df['link_type'] = np.where(df.link.str.contains(search),'Search', 
        np.where(df.link.str.contains(profile),'Profile', 
        np.where(df.link.str.contains(homepage), 'Homepage', 
        np.where(df.link.str.contains(login),'Login','Other')))) 


print (df) 
     link link_new link_type 
0  search Search Search 
1 homepage d Homepage Homepage 
2 login dd  Login  Login 
3 profile t Profile Profile 
4   ff  Other  Other

時序：

#[5000 rows x 1 columns] 
df = pd.DataFrame({'link':['search','homepage d','login dd', 'profile t', 'ff']}) 
df = pd.concat([df]*1000).reset_index(drop=True) 

In [346]: %timeit df['link'].apply(page_type) 
1000 loops, best of 3: 1.72 ms per loop 

In [347]: %timeit np.where(df.link.str.contains(search),'Search', np.where(df.link.str.contains(profile),'Profile', np.where(df.link.str.contains(homepage), 'Homepage', np.where(df.link.str.contains(login),'Login','Other')))) 
100 loops, best of 3: 11.7 ms per loop

來源

2016-09-27 12:08:59 jezrael

我多個條件添加溶液，'apply'溶液是更快'np.where' 。 – jezrael

.str適用於整個系列，但在這裏，你正在處理的系列裏面的值。

你可以這樣做：df['link'].str.contains(search)
或者像你想：df['link'].apply(lambda x: 'Search' if search in x else 'Other')

編輯

更通用的方法：

def my_filter(x, val, c_1, c_2): 
    return c_1 if val in x else c_2 

df['link'].apply(lambda x: my_filter(x, 'homepage', 'homepage', 'other'))

來源

2016-09-27 12:31:26 Orelus

如果我想指定一個elif條件：如果主頁然後「主頁否則其他'？ – xxxvinxxx

解決它像： df ['link_type'] = np.where（df.referrer.str.contains（search）， 'Search'， np.where（df.referrer.str.contains（profile），'Profile'， np.where（df.referrer.str.contains（homepage），'Homepage'， np.where（df .referrer.str.contains（登錄），'登錄'，'其他'）））） – xxxvinxxx

我用你的例子編輯 – Orelus

你也可以使用一個list comprehesion，如果你想在鏈接中找到單詞搜索：

佛例如：

df['Search'] = [('search' if 'search' in item else 'other') for item in df['link']]

輸出：

ColumnA      link Search 
0  a  http://word/12/word other 
1  b  https://search-125.php search 
2  c  http://news-8282.html other 
3  d http://search-hello-1.html search

創建函數：

def page_type(x, y): 
    df[x] = [('search' if 'search' in item else 'other') for item in df[y]] 

page_type('Search', 'link') 

In [6]: df 
Out[6]: 
    ColumnA      link Search 
0  a   http://word/12/word other 
1  b  https://search-125.php search 
2  c  http://news-8282.html other 
3  d http://search-hello-1.html search

來源

2016-09-27 13:51:17 estebanpdl

熊貓並應用函數來匹配字符串

回答

相關問題