2016-03-14 53 views
2

class_name列包含當然名和隊列號。 我想柱拆分兩列(名稱,隊列號)使用「熊貓」來分隔列並用提取的值填充另一列

FROM:

| class_name | 

| introduction to programming 1th | 
| introduction to programming 2th | 
| introduction to programming 3th | 
| introduction to programming 4th | 
| algorithms and data structure 1th | 
| algorithms and data structure 2th | 
| object-oriented programming | 
| database systems | 

(我知道這應該是這樣的第一,第二,第三,但該字符串是在我的語言和我們使用在數字之後重複相同的字符)。

TO:

| class_name | class_cohort | 

| introduction to programming | 1 | 
| introduction to programming | 2 | 
| introduction to programming | 3 | 
| introduction to programming | 4 | 
| algorithms and data structure | 1 | 
| alrogithms and data structure | 2 | 
| object-oriented programming | 1 | 
| database systems | 1 | 

這裏是代碼我一直在努力:

import pandas as pd 

course_count = 100 
df = pd.read_csv("course.csv", nrows=course_count) 

cols_interest=['class_name', 'class_department', 'class_type', 'student_target', 'student_enrolled'] 

df = df[cols_interest] 
df.insert(1, 'class_cohort', 0) 

# this is how I extract the numbers 
df['class_name'].str.extract('(\d)').head() 

# but I cannot figure out a way to copy those values into column 'class_cohort' which I filled with 0's. 

# once I figure that out, I plan to discard the last digits 
df['class_name'] = df['class_name'].map(lambda x: str(x)[:-1]) 

我簡要地考察一個解決方案,我把逗號都1號,二路,3TH再拆前該列使用逗號作爲分隔符,但我無法想出一種方法來取代\ s1th - >,所有數字的第1位。

回答

1

您可以indexing by positions

df['class_cohort'] = df['class_name'].str[-3:-2] 
df['class_name'] = df['class_name'].str[:-4] 
print df 
    class_name class_cohort 
0  cs101   1 
1  cs101   2 
2  cs101   3 
3  cs101   4 
4 algorithms   1 
5 algorithms   2 

或者使用str.extract

df['class_cohort'] = df['class_name'].str.extract('(\d)') 
df['class_name'] = df['class_name'].str[:-4] 
print df 
         class_name class_cohort 
0 introduction to programming   1 
1 introduction to programming   2 
2 introduction to programming   3 
3 introduction to programming   4 
4 algorithms and data structure   1 
5 algorithms and data structure   2 
+0

有一些細節我已經離開了 - 因爲課程名稱由多個的\ S +不會在這種情況下工作詞(1-6個字)。 – younghak

+0

謝謝。請檢查解決方案。 – jezrael

+0

謝謝 - 'str.extract'運作良好。我已經提高了你的答案,但是我太過分了,我的投票也沒有公開。 – younghak