2017-10-16 77 views
-1

我正在嘗試利用R進行一些基本的文本分析。刪除字符串中的特定短語

我有一列包含複雜的數據類型。我希望保留一張單獨的表格,我可以使用它從第一個數據列中刪除某些短語。

我試過gsubfn但沒有任何成功。

例如

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT","BODY CORPORATE") 

爲什麼

x <- gsubfn(removefields,"",dirtydata) 

不行?

盼望輸出

c("JOHN ","@PETER","BOB 22","RUPERT ") 
+0

請包含額外加載的軟件包的名稱。但你可以嘗試'gsub(paste(removefields,collapse =「|」),「」,dirtydata)' – Jimbou

+0

可能重複[如何用R替換多個字符串](https://stackoverflow.com/questions/28285480/how-to-replace-multiple-strings-with-the-same-in-r)或[this one](https://stackoverflow.com/questions/24645390/r-remove-multiple-text-strings -in-data-frame) – Jimbou

回答

0

使用請從下面編輯的代碼base R的功能

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT","BODY CORPORATE") 
pastedFields = paste0(removefields,collapse = "|") 
gsub(pastedFields,"",dirtydata) 
+0

你能詳細說明嗎?我假設你以列表格式獲得輸出,除了矢量嗎?如果是這樣,請將您應用的代碼行放在數據列中 –

0

試試這個。

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT | BODY CORPORATE") 
x <- gsub(removefields, "", dirtydata) 
0

這可以推廣任何你投入removefields和周圍串條空格被刪除:

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <- c("COURT","BODY CORPORATE") 
removefields <- paste0("\\s+", removefields, "\\s+", collapse = "|") 
x <- gsub(removefields, "", dirtydata) 
0

我們可以使用tm

dirtydata <- c("JOHN COURT","@PETER","BOB 22","RUPERT BODY CORPORATE") 
removefields <-c("COURT","BODY CORPORATE") 

library(tm) 
removeWords(dirtydata, removefields) 

> removeWords(dirtydata, removefields) 
[1] "JOHN " "@PETER" "BOB 22" "RUPERT " 
相關問題