2014-06-12 53 views
7
> dc1 
    V1    V2 
1 20140211-0100  |Box 
2 20140211-1782  |Office|Ball 
3 20140211-1783  |Office 
4 20140211-1784  |Office 
5 20140221-0756  |Box 
6 20140203-0418  |Box 
> strsplit(as.character(dc1[,2]),"^\\|") 
[[1]] 
[1] "" "Box" 


[[2]] 
[1] ""    "Office" "Ball" 


[[3]] 
[1] ""    "Office" 


[[4]] 
[1] ""    "Office" 


[[5]] 
[1] "" "Box" 


[[6]] 
[1] "" "Box" 

如何刪除空白( 「」)strsplit results.The結果應該是這樣的:從strsplit刪除空格中的R

[[1]] 
[1] "Box" 
[[2]] 
[1] "Office" "Ball" 
+0

如果您使用「^ \\ |」,但您的輸出不應該是您顯示的那個,但是[[2]] [1]「」「Office | Ball」 – Math

回答

7

您可以在列表中檢查使用lapply。我更改了您的strsplit的定義以符合您的預期輸出。

dc1 <- read.table(text = 'V1    V2 
1 20140211-0100  |Box 
2 20140211-1782  |Office|Ball 
3 20140211-1783  |Office 
4 20140211-1784  |Office 
5 20140221-0756  |Box 
6 20140203-0418  |Box', header = TRUE) 

out <- strsplit(as.character(dc1[,2]),"\\|") 

> lapply(out, function(x){x[!x ==""]}) 
[[1]] 
[1] "Box" 

[[2]] 
[1] "Office" "Ball" 

[[3]] 
[1] "Office" 

[[4]] 
[1] "Office" 

[[5]] 
[1] "Box" 

[[6]] 
[1] "Box" 
3

我沒有一個全球性的解決方案,但是對於你的榜樣,你可以嘗試:

strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|")

它消除了第一|(這是正則表達式"^\\|"說),這是在執行拆分之前爲""的原因。

2

在這種情況下,你可以通過調用"["sapply

> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1) 
# [[1]] 
# [1] "Box" 

# [[2]] 
# [1] "Office" "Ball" 

# [[3]] 
# [1] "Office" 

# [[4]] 
# [1] "Office" 

# [[5]] 
# [1] "Box" 

# [[6]] 
# [1] "Box" 
3

刪除每個向量的第一個元素,您可以使用:

library(stringr) 
str_extract_all(dc1[,2], "[[:alpha:]]+") 
[[1]] 
[1] "Box" 

[[2]] 
[1] "Office" "Ball" 

[[3]] 
[1] "Office" 

[[4]] 
[1] "Office" 

[[5]] 
[1] "Box" 

[[6]] 
[1] "Box" 
2

另一種方法unlisting的結果後使用nzchar()strsplit()

out <- unlist(strsplit(as.character(dc1[,2]),"\\|")) 

out[nzchar(x=out)] # removes the extraneous "" marks 
0
library("stringr") 

lapply(str_split(dc1$V2, "\\|"), function(x) x[-1]) 

[[1]] 
[1] "Box" 

[[2]] 
[1] "Office" "Ball" 

[[3]] 
[1] "Office" 

[[4]] 
[1] "Office" 

[[5]] 
[1] "Box" 

[[6]] 
[1] "Box"