2016-12-14 38 views
0

有沒有辦法檢查一個字符串的任何部分是否與python中的另一個字符串匹配?另一個字符串中包含的字符串的一部分正則表達式python

對於例如爲:我的URL看起來像這樣

url = pd.DataFrame({'urls' : ['www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA', 'www.ulta.com/beautyservices/benefitbrowbar/']}) 

,我有一個字符串看起來像:

string_list = ['Benefit Cosmetics', 'Anastasia Beverly Hills'] 
string = '|'.join(string_list) 

我想與url匹配string

Anastasia Beverly Hillswww.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA

www.ulta.com/beautyservices/benefitbrowbar/Benefit Cosmetics

我一直在嘗試url['urls'].str.contains('('+string+')', case = False)但這並不符合。

什麼是正確的方法來做到這一點?

+0

結帳:http://www.pythontutor.com/visualize.html#mode=edit –

回答

1

我不能做到這一點作爲一個行正則表達式,但這裏是使用itertools任何企圖我:

import pandas as pd 
from itertools import product 

url = pd.DataFrame({'urls' : ['www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA', 'www.ulta.com/beautyservices/benefitbrowbar/']}) 
string_list = ['Benefit Cosmetics', 'Anastasia Beverly Hills'] 

""" 
For each of Cartesian product (the different combinations) of 
string_list and urls. 
""" 
for x in list(product(string_list, url['urls'])): 
    """ 
    If any of the words in the string (x[0]) are present in 
    the URL (x[1]) disregarding case. 
    """ 
    if any (word.lower() in x[1].lower() for word in x[0].split()): 
     """ 
     Show the match. 
     """ 
     print ("Match String: %s URL: %s" % (x[0], x[1])) 

輸出:

Match String: Benefit Cosmetics URL: www.ulta.com/beautyservices/benefitbrowbar/ 
Match String: Anastasia Beverly Hills URL: www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA 

更新時間:

你在看它的方式可以選擇使用:

import pandas as pd 
import warnings 
pd.set_option('display.width', 100) 
""" 
Supress the warning it will give on a match. 
""" 
warnings.filterwarnings("ignore", 'This pattern has match groups') 
string_list = ['Benefit Cosmetics', 'Anastasia Beverly Hills'] 
""" 
Create a pandas DataFrame. 
""" 
url = pd.DataFrame({'urls' : ['www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00GI21NZA', 'www.ulta.com/beautyservices/benefitbrowbar/']}) 
""" 
Using one string at a time. 
""" 
for string in string_list: 
    """ 
    Get the individual words in the string and concatenate them 
    using a pipe to create a regex pattern. 
    """ 
    s = "|".join(string.split()) 
    """ 
    Update the DataFrame with True or False where the regex 
    matches the URL. 
    """ 
    url[string] = url['urls'].str.contains('('+s+')', case = False) 
""" 
Show the result 
""" 
print (url) 

這將輸出:

           urls Benefit Cosmetics Anastasia Beverly Hills 
0 www.amazon.com/ANASTASIA-Beverly...Brow/dp/B00...    False     True 
1  www.ulta.com/beautyservices/benefitbrowbar/    True     False 

我猜,如果你想在一個數據幀,可能會更好,但我更喜歡第一種方式。

相關問題