2
我用熊貓清洗數據如下:在Python中使用正則表達式來排除串
s3 = pd.DataFrame({'title':["intermediate" ,"Basmati/sadri" ,"temperate japonica" ,"Temperate japonica" , "Japonica" ,"Tropical japonica" ,"Aromatic (basmati/sandri type" , "indica" , "Aus/boro" , "Aus" ,"aus" ,"japonica" , "tropical japnica", "" , "Indica" , "Intermediate type" ]})
s3.title.replace(r".*[Jj]ap(o)?nica$", "japonica" ,inplace=True,regex=True)
s3.title.replace(r"Indica", "indica" ,inplace=True,regex=True)
print s3
而且我得到了:
title
0 intermediate
1 Basmati/sadri
2 japonica
3 japonica
4 japonica
5 japonica
6 Aromatic (basmati/sandri type
7 indica
8 Aus/boro
9 Aus
10 aus
11 japonica
12 japonica
13
14 indica
15 Intermediate type
我想替換字符串,如:
if string not in ['japonica', "indica"] :
string = 'others'
但如何做到這一點的正則表達式:
s3.title.replace(r"some regex", "others" ,inplace=True,regex=True)
很大〜不過line13是空白的,可以將其添加到正則表達式? – milowang
@milowang完成。 – 2Cubed
我試過s3.title.replace(r'^(?!japonica | indica)。+ |(japonica | indica)。+ $',「others」,inplace = True,regex = True),第13行仍然是空白。我需要額外寫一行來將空白替換爲「其他」。正則表達式如何匹配一個空行? – milowang