示例數據幀:分開的不同組合到第一和最後使用dplyr,tidyr,和正則表達式
name <- c("Smith John Michael","Smith, John Michael","Smith John, Michael","Smith-John Michael","Smith-John, Michael")
df <- data.frame(name)
df
name
1 Smith John Michael
2 Smith, John Michael
3 Smith John, Michael
4 Smith-John Michael
5 Smith-John, Michael
我需要實現以下所需的輸出:
name first.name last.name
1 Smith John Michael John Smith
2 Smith, John Michael John Smith
3 Smith John, Michael Michael Smith John
4 Smith-John Michael Michael Smith-John
5 Smith-John, Michael Michael Smith-John
的規則如下:如果字符串中有逗號,則以前的任何內容都是姓氏。在逗號後面的第一個單詞是名字。如果字符串中沒有逗號,第一個詞是姓,第二個詞是姓。帶連字符的單詞是一個單詞。我寧願用dplyr和regex來實現這一點,但我會採取任何解決方案。感謝您的幫助
見http://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame –