2016-05-03 45 views
2

有沒有辦法根據當前值的第一個字符將新值映射到數據幀列上。熊貓:根據第一個字符映射新值

我當前的代碼:

ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('1'), 'city', ncesvars['urbantype']) 
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('2'), 'suburban', ncesvars['urbantype']) 
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('3'), 'town', ncesvars['urbantype']) 
ncesvars['urbantype'] = np.where(ncesvars['urbantype'].str.startswith('4'), 'rural', ncesvars['urbantype']) 

我想使用某種dict然後pd.replace,但不知道怎麼做,與.str.startswith()

回答

2

您可以定義類別的字典,使用str[0:1]切片數據,並通過測試數據的第一個字符是否在你的字典鍵調用mapSeries的布爾面具,這樣只匹配將會否則你覆蓋用NaN覆蓋,因爲在下例中沒有最後一行的映射:

In [16]: 
df = pd.DataFrame({'urbantype':['1 asdas','2 asd','3 asds','4 asdssd','5 asdas']}) 
df 

Out[16]: 
    urbantype 
0 1 asdas 
1  2 asd 
2 3 asds 
3 4 asdssd 
4 5 asdas 

In [18]: 
d = {'1':'city','2':'suburban', '3': 'town','4':'rural'} 
df.loc[df['urbantype'].str[0:1].isin(d.keys()), 'urbantype'] = df['urbantype'].str[0:1].map(d) 
df 

Out[18]: 
    urbantype 
0  city 
1 suburban 
2  town 
3  rural 
4 5 asdas 
+0

感謝您的輸入。與@ ayhan的答案相比,'df.loc'部分很重要嗎? – As3adTintin

+1

是的,因爲您只想影響數據與您的詞典鍵匹配的行,否則您用'NaN'覆蓋該行,這就是最後一行不變的原因 – EdChum

+0

ahhh ok謝謝! – As3adTintin

3

嘗試類似於:

ncesvars['urbantype'] = ncesvars['urbantype'].replace({ 
    r'^1.*', 'city', 
    r'^2.*', 'suburban'}, 
    regex=True) 

測試:

In [32]: w 
Out[32]: 
    word 
0 1_A_ 
1 word03 
2 word02 
3 word00 
4 2xxx 
5 word04 
6 word01 
7 word02 
8 word04 
9 3aaa 

In [33]: w['word'].replace({r'^1.*': 'city', r'^2.*': 'suburban', r'^3.*': 'town'}, regex=True) 
Out[33]: 
0  city 
1  word03 
2  word02 
3  word00 
4 suburban 
5  word04 
6  word01 
7  word02 
8  word04 
9  town 
Name: word, dtype: object 
+0

感謝您的輸入。我收到了erorr'replace()得到了一個意想不到的關鍵字參數'regex',當我嘗試沒有'regex'參數時,我收到錯誤'replace()至少需要3個參數(給出2)' – As3adTintin

+0

不起作用對於我來說,我收到原始值 – As3adTintin

+0

@ As3adTintin,我已經添加了一個測試用例 – MaxU