2016-04-30 46 views
1

我試圖從twitter數據中獲取像@someone @somebody這樣的twitter數據提及,並使用推特和他們提到的人的信息創建新的數據框。識別提及的評論和填充數據框

例子:

tweets <- data.frame(user=c("people","person","ghost"),text = c("Hey, check this out 
@somebody @someone","love this @john","amazing")) 

得到的這個數據幀:

**user  text** 

*people Hey, check this out @somebody @someone* 

*person love this @john* 

*ghost amazing* 

期望的結果是:

**id  mention** 

*people @somebody* 

*people @someone* 

*person john* 

*ghost* 

你們能幫助我,好嗎?

回答

1

您可以通過使用圖書館stringr做這樣的事情:

library(stringr) 
tweets$mention <- str_extract_all(tweets$text, '\\@\\S+') 

輸出如下:

tweets 

    user          text    mention 
1 people Hey, check this out \[email protected] @someone @somebody, @someone 
2 person       love this @john    @john 
3 ghost         amazing      

要獲得長格式輸出,你可以做這樣的事情:

library(dplyr) 
library(tidyr) 
tweets <- rbind(filter(tweets, !grepl('\\@', mention)), unnest(tweets)) 
tweets <- tweets[, -2] 

輸出如下:

user mention 
1 ghost   
2 people @somebody 
3 people @someone 
4 person  @john