2014-04-03 59 views
2

我有一個字符向量(VEC)是這樣的:分裂開字符向量爲單個單詞中的R

[1] "super good dental associates" "cheap dentist in bel air md"  
    "dentures "     "dentures "      
    "in office teeth whitening"  "in office teeth whitening"  
    "dental gum surgery bel air, md" 
[8] "dental implants"    "dental implants"     
    "veneer teeth pictures" 

我需要分開打破這種爲個體話。我嘗試這樣做:

singleWords <- strsplit(vec, ' ')[[1]] 

,但是,我只得到該向量的第一個元素上分割:

[1] "super"  "good"  "dental"  "associates" 

我怎樣才能獲得的所有單詞作爲單個元件的單一載體?

+1

嘗試'sapply通話(VEC, strsplit,「」)',你可以用'unlist'包裝它,如果你想要它們在一個向量中 –

+0

這不是拆分單個詞:( – Cybernetic

+0

我添加了一個例子使用'sapply' –

回答

2

你可以嘗試:

strsplit(paste(vec, collapse = " "), ' ')[[1]] 
+0

完美!!!謝謝你:) – Cybernetic

+0

如果這篇文章解決了你的問題,@Cyber​​netic,請考慮通過點擊左邊的複選標記來接受它。總投票。 – gung

+0

@gung感謝你! –

1

只是爲了確認我的意見,因爲你提到它不工作,一起來看看。由於有幾個元素有額外的空間,我建議使用\\s+作爲正則表達式來分割,而不是從我的評論的單個空間。乾杯。

> (newVec <- unlist(sapply(vec, strsplit, "\\s+", USE.NAMES = FALSE))) 
# [1] "super"  "good"  "dental"  "associates" "cheap"  "dentist" 
# [7] "in"   "bel"  "air"  "md"   "dentures" "dentures" 
#[13] "in"   "office"  "teeth"  "whitening" "in"   "office"  
#[19] "teeth"  "whitening" "dental"  "gum"  "surgery" "bel"  
#[25] "air,"  "md"   "dental"  "implants" "dental"  "implants" 
#[31] "veneer"  "teeth"  "pictures" 

而且因爲我在那裏看到一個流浪逗號,它可能是清理所有的標點符號一個好主意(如果有的話遺體)一起gsub

> gsub("[[:punct:]]", "", newVec) 
相關問題