在R中,我將一個4的ngram的DocumentTermMatrix轉換爲數據框,現在我想將ngram分成兩列,一列帶有字符串的前3個字,其他與硬道理。我可以通過多個步驟來完成這個任務,但是考慮到df的大小,我希望能夠在線完成。將數據幀中的字符串拆分爲兩列
這裏就是我試圖完成:
# str_name w123 w4 freq
# 1 One Two Three Four One Two Three Four 10
這給了我第三個字:
df <- data.frame(str_name = "One Two Three Four", freq = 10)
df %>% separate(str_name, c("w123","w4"), sep = "\\w+$", remove=FALSE)
# str_name w123 w4 freq
# 1 One Two Three Four One Two Three 10
這給了我最後一個字,但也包含了空間:
df <- data.frame(str_name = "One Two Three Four", freq = 10)
df %>% separate(str_name, c("sp","w4"), sep = "\\w+\\s\\w+\\s\\w+", remove=FALSE)
# str_name sp w4 freq
# 1 One Two Three Four Four 10
這是很長的路
df <- data.frame(w4 = "One Two Three Four", freq = 10)
df <- df %>% separate(w4, c('w1', 'w2', 'w3', 'w4'), " ")
df$lookup <- paste(df$w1,df$w2,df$w3)
# w1 w2 w3 w4 freq lookup
# 1 One Two Three Four 10 One Two Three
完美,謝謝! – pheeper