2017-08-30 56 views
1

您好分離我有字符串中的R的載體中被分離@,我要提取由@ ..分離實施例提取字符由符號

tweets =c(" @john @tom it is wonderful ", "@neel it is awesome ", "it is awesome") 

我想要一個矩陣字/data.frame只有沒有這樣的文字作爲輸出

X1=c("john","tom') 
X2 =c("neel",NA) , x3 = (NA,NA), data frame = as.data.frame(X1,X2,x3) 

我該怎麼辦?

回答

2

base R的選擇將是使用gregexpr/regmatches然後墊NA S到list元件與length<-提取並轉換爲matrix

lst <- regmatches(tweets, gregexpr("(?<[email protected])\\w+", tweets, perl = TRUE)) 
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst)))) 
#  [,1] [,2] 
#[1,] "john" "tom" 
#[2,] "neel" NA 
#[3,] NA  NA