轉換數據幀與字tibble算

我試圖執行基於http://tidytextmining.com/sentiment.html#the-sentiments-dataset情感分析。在執行情感分析之前，我需要將我的數據集轉換爲整潔的格式。轉換數據幀與字tibble算

我的數據集的形式：

x <- c("test1" , "test2") 
y <- c("this is test text1" , "this is test text2") 
res <- data.frame("url" = x, "text" = y) 
res 
    url    text 
1 test1 this is test text1 
2 test2 this is test text2

爲了轉換成每行一個觀察需要處理文本列，並添加包含單詞和次數似乎對這個URL新列。相同的網址將出現在多行中。

這裏是我的嘗試：

library(tidyverse) 

x <- c("test1" , "test2") 
y <- c("this is test text1" , "this is test text2") 
res <- data.frame("url" = x, "text" = y) 
res 

res_1 <- data.frame(res$text) 
res_2 <- as_tibble(res_1) 
res_2 %>% count(res.text, sort = TRUE)

# A tibble: 2 x 2 
      res.text  n 
       <fctr> <int> 
1 this is test text1  1 
2 this is test text2  1

如何計算在res $文本數據幀的話，爲了進行情感分析維持網址是什麼？

更新：

x <- c("test1" , "test2") 
y <- c("this is test text1" , "this is test text2") 
res <- data.frame("url" = x, "text" = y) 
res 

res %>% 
group_by(url) %>% 
transform(text = strsplit(text, " ", fixed = TRUE)) %>% 
unnest() %>% 
count(url, text)

返回錯誤：

Error in strsplit(text, " ", fixed = TRUE) : non-character argument

我試圖轉換爲tibble，因爲這似乎是tidytextmining情感分析所需的格式：http://tidytextmining.com/sentiment.html#the-sentiments-dataset

來源

2017-12-02 blue-sky

爲什麼你需要將其轉換tibble？換句話說，你的頭銜似乎並不代表真正的問題。看來你只是想要一個字可以按每個網址。我認爲，一個可能的tibbliverse方法可能是'水庫％>％GROUP_BY（URL）％>％轉化（文字= strsplit（文字「」固定= TRUE））％>％UNNEST（）％>％計（網址，文本）'（假設'text'是一個字符串，而不是一個因素） –

@DavidArenburg請參閱更新 –

你尋找這樣的東西？當你要處理與tidytext包情感分析，則需要在每個字符字符串unnest_tokens()分隔單詞。這個功能可以做的不僅僅是將文字分成單詞。如果你想稍後看看這個功能。一旦你有每行一個字，你可以指望每個單詞出現了多少次使用count()每個文本。然後，你想刪除停用詞。 tidytext軟件包有數據，所以你可以調用它。最後，你需要有情緒信息。在這裏，我選擇了AFINN，但如果你願意，你可以選擇另一個。我希望這能幫到您。

x <- c("text1" , "text2") 
y <- c("I am very happy and feeling great." , "I am very sad and feeling low") 
res <- data.frame("url" = x, "text" = y, stringsAsFactors = F) 

# url        text 
#1 text1 I am very happy and feeling great. 
#2 text2  I am very sad and feeling low 

library(tidytext) 
library(dplyr) 

data(stop_words) 
afinn <- get_sentiments("afinn") 

unnest_tokens(res, input = text, output = word) %>% 
count(url, word) %>% 
filter(!word %in% stop_words$word) %>% 
inner_join(afinn, by = "word") 

# url word  n score 
# <chr> <chr> <int> <int> 
#1 text1 feeling  1  1 
#2 text1 happy  1  3 
#3 text2 feeling  1  1 
#4 text2  sad  1 -2

來源

2017-12-03 01:49:03 jazzurro

轉換數據幀與字tibble算

回答

相關問題