使用「熊貓」來分隔列並用提取的值填充另一列

class_name列包含當然名和隊列號。我想柱拆分兩列（名稱，隊列號）使用「熊貓」來分隔列並用提取的值填充另一列

FROM：

| class_name | 

| introduction to programming 1th | 
| introduction to programming 2th | 
| introduction to programming 3th | 
| introduction to programming 4th | 
| algorithms and data structure 1th | 
| algorithms and data structure 2th | 
| object-oriented programming | 
| database systems |

（我知道這應該是這樣的第一，第二，第三，但該字符串是在我的語言和我們使用在數字之後重複相同的字符）。

TO：

| class_name | class_cohort | 

| introduction to programming | 1 | 
| introduction to programming | 2 | 
| introduction to programming | 3 | 
| introduction to programming | 4 | 
| algorithms and data structure | 1 | 
| alrogithms and data structure | 2 | 
| object-oriented programming | 1 | 
| database systems | 1 |

這裏是代碼我一直在努力：

import pandas as pd 

course_count = 100 
df = pd.read_csv("course.csv", nrows=course_count) 

cols_interest=['class_name', 'class_department', 'class_type', 'student_target', 'student_enrolled'] 

df = df[cols_interest] 
df.insert(1, 'class_cohort', 0) 

# this is how I extract the numbers 
df['class_name'].str.extract('(\d)').head() 

# but I cannot figure out a way to copy those values into column 'class_cohort' which I filled with 0's. 

# once I figure that out, I plan to discard the last digits 
df['class_name'] = df['class_name'].map(lambda x: str(x)[:-1])

我簡要地考察一個解決方案，我把逗號都1號，二路，3TH再拆前該列使用逗號作爲分隔符，但我無法想出一種方法來取代\ s1th - >，所有數字的第1位。

來源

2016-03-14 younghak

您可以indexing by positions：

df['class_cohort'] = df['class_name'].str[-3:-2] 
df['class_name'] = df['class_name'].str[:-4] 
print df 
    class_name class_cohort 
0  cs101   1 
1  cs101   2 
2  cs101   3 
3  cs101   4 
4 algorithms   1 
5 algorithms   2

或者使用str.extract：

df['class_cohort'] = df['class_name'].str.extract('(\d)') 
df['class_name'] = df['class_name'].str[:-4] 
print df 
         class_name class_cohort 
0 introduction to programming   1 
1 introduction to programming   2 
2 introduction to programming   3 
3 introduction to programming   4 
4 algorithms and data structure   1 
5 algorithms and data structure   2

來源

2016-03-14 16:46:55 jezrael

有一些細節我已經離開了 - 因爲課程名稱由多個的\ S +不會在這種情況下工作詞（1-6個字）。 – younghak

謝謝。請檢查解決方案。 – jezrael

謝謝 - 'str.extract'運作良好。我已經提高了你的答案，但是我太過分了，我的投票也沒有公開。 – younghak

使用「熊貓」來分隔列並用提取的值填充另一列

回答

相關問題