從pd.series格式的列中拆分字符串python

我是Python新手，正在嘗試做一些事情來處理它。從pd.series格式的列中拆分字符串python

雖然這樣做，我卡在這裏。

我必須以.csv格式的數據，我用進口

data = pandas.read_csv("data.csv") 
data.head() 

    user rating  id 
0  1  3.5 1_1193 
1  1  3.5 1_661 
2  1  3.5 1_914 
3  1  3.5 1_3408 
4  1  3.5 1_2355

我需要的是從「身份證」專欄中，我應該得到它後面的數字「_」到Python。

我試圖做的是：

data.id.split('_')

這給了我錯誤：「‘數據幀’對象有沒有屬性‘分裂’」

因此，我提出的「ID」列np.array在從stackoverflow上的某些解決方案中讀取它之後。

s1 = data.id.values 
s2 = np.array2string(s1, separator=',',suppress_small=True) 
s2.split('_')

這給了我輸出：

["['1", 
"1193','1", 
"661','1", 
"914',..., '6040", 
"161','6040", 
"2725','6040", 
"1784']"] 
s2.split('_')[1]

遞給我：

"1193','1"

我應該怎麼做才能字符串後「_」？

來源

2017-02-14 Akanshya Bapat

您需要矢量str.split與str[1]選擇第二列表 - 你也可以檢查docs：

data['a'] = data.id.str.split('_').str[1] 
print (data) 
    user rating  id  a 
0  1  3.5 1_1193 1193 
1  1  3.5 1_661 661 
2  1  3.5 1_914 914 
3  1  3.5 1_3408 3408 
4  1  3.5 1_2355 2355 

print (data.dtypes) 
user  int64 
rating float64 
id   object 
a   object <- format is object (obviously string) 
dtype: object

#split and cast column to int 
data['a'] = data.id.str.split('_').str[1].astype(int) 
print (data) 
    user rating  id  a 
0  1  3.5 1_1193 1193 
1  1  3.5 1_661 661 
2  1  3.5 1_914 914 
3  1  3.5 1_3408 3408 
4  1  3.5 1_2355 2355 

print (data.dtypes) 
user  int64 
rating float64 
id   object 
a   int32 <- format is int 
dtype: object

此外，如果需要用新的值替換id柱：

data.id = data.id.str.split('_').str[1] 
print (data) 
    user rating id 
0  1  3.5 1193 
1  1  3.5 661 
2  1  3.5 914 
3  1  3.5 3408 
4  1  3.5 2355

data.id = data.id.str.split('_').str.get(1) 
print (data) 
    user rating id 
0  1  3.5 1193 
1  1  3.5 661 
2  1  3.5 914 
3  1  3.5 3408 
4  1  3.5 2355

來源

2017-02-14 07:26:58 jezrael

如果我或另一種答案是有幫助的，不要忘了[接受]（http://meta.stackexchange.com/a/5235/295067）它。謝謝。 – jezrael

嗨。這一個爲我工作。 :) –

請將答案標記爲已接受，請單擊答案旁邊的複選標記以將其從灰色變爲填充。謝謝。 – jezrael

一對夫婦更多的選擇......

str.extract

df.id.str.extract('.*_(.*)', expand=False)

str.replace

df.id.str.replace('.*_', '')

兩個產量

0 1193 
1  661 
2  914 
3 3408 
4 2355 
Name: id, dtype: object

來源

2017-02-14 07:36:42 piRSquared

謝謝。它爲我工作。 –

從pd.series格式的列中拆分字符串python

回答

相關問題