2017-08-22 38 views
0

我需要檢測包含特定字符序列的df/tibble的行。R:獲取具有特定字符的數據幀行

seq <- "RT @AventusSystems"是我的序列

df <- structure(list(text = c("@AventusSystems Wow, what a upgrade from help of investor", 
"RT @AventusSystems: A recent article about our investors as shown in Forbes! t.co/n8oGwiEDpu #Aventus #GlobalAdvisors #4thefans #Ti…", 
"@AventusSystems Very nice to have this project", "RT @AventusSystems: Join the #TicketRevolution with #Aventus today! #Aventus #TicketRevolution #AventCoin #4thefans t.co/OPlyCFmW4a" 
), Tweet_Id = c("898359464444559360", "898359342952439809", "898359326552633345", 
"898359268226736128"), created_at = structure(c(17396, 17396, 
17396, 17396), class = "Date")), .Names = c("text", "Tweet_Id", 
"created_at"), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame")) 

select(df, contains(seq)) 
# A tibble: 4 x 0 

sapply(df$text, grepl, seq)回報只有4 FALSE

什麼我錯了嗎?什麼是正確的解決方案? 謝謝你的幫助

+1

請問'grep的(SEQ,DF $文本)'爲你做? – csgroen

+1

或者,如果您想要包含這些字符的數據幀行,請使用「filter(df,grepl(seq,text))' –

+0

@cs groen是的,它的確如此。TY – gabx

回答

2

首先,grepl已經被矢量化爲其參數x,所以你不需要sapply。你可以做grepl(seq, df$text)

爲什麼你的代碼不能正常工作是sapply傳遞X函數參數的每個元素FUN參數作爲第一個參數(所以你正在尋找搜索模式「@AventusSystems哇,好從幫助升級

最後的投資者」,等你seq對象,dplyr::select選擇列,而要使用dplyr::filter,它過濾行。