2
class_name列包含當然名和隊列號。 我想柱拆分兩列(名稱,隊列號)使用「熊貓」來分隔列並用提取的值填充另一列
FROM:
| class_name |
| introduction to programming 1th |
| introduction to programming 2th |
| introduction to programming 3th |
| introduction to programming 4th |
| algorithms and data structure 1th |
| algorithms and data structure 2th |
| object-oriented programming |
| database systems |
(我知道這應該是這樣的第一,第二,第三,但該字符串是在我的語言和我們使用在數字之後重複相同的字符)。
TO:
| class_name | class_cohort |
| introduction to programming | 1 |
| introduction to programming | 2 |
| introduction to programming | 3 |
| introduction to programming | 4 |
| algorithms and data structure | 1 |
| alrogithms and data structure | 2 |
| object-oriented programming | 1 |
| database systems | 1 |
這裏是代碼我一直在努力:
import pandas as pd
course_count = 100
df = pd.read_csv("course.csv", nrows=course_count)
cols_interest=['class_name', 'class_department', 'class_type', 'student_target', 'student_enrolled']
df = df[cols_interest]
df.insert(1, 'class_cohort', 0)
# this is how I extract the numbers
df['class_name'].str.extract('(\d)').head()
# but I cannot figure out a way to copy those values into column 'class_cohort' which I filled with 0's.
# once I figure that out, I plan to discard the last digits
df['class_name'] = df['class_name'].map(lambda x: str(x)[:-1])
我簡要地考察一個解決方案,我把逗號都1號,二路,3TH再拆前該列使用逗號作爲分隔符,但我無法想出一種方法來取代\ s1th - >,所有數字的第1位。
有一些細節我已經離開了 - 因爲課程名稱由多個的\ S +不會在這種情況下工作詞(1-6個字)。 – younghak
謝謝。請檢查解決方案。 – jezrael
謝謝 - 'str.extract'運作良好。我已經提高了你的答案,但是我太過分了,我的投票也沒有公開。 – younghak