如何在R中製作多個語料庫

這是一個有超過40,000行的汽車評論數據，每個評論有超過500個字符。這是樣本數據：https://drive.google.com/open?id=1ZRwzYH5McZIP2NLKxncmFaQ0mX1Pe0GShTMu57Tac_E 如何在R中製作多個語料庫

| brand | review   | favorite  | c4 | c5 | c6 | c7 | c8 | 
| brand1 | 500 characters1 | 100 characters1 | | | | | | 
| brand2 | 500 characters2 | 100 Characters2 | | | | | | 
| brand2 | 500 characters3 | 100 Characters3 | | | | | | 
| brand2 | 500 characters4 | 100 Characters4 | | | | | | 
| brand3 | 500 characters5 | 100 Characters5 | | | | | | 
| brand3 | 500 characters6 | 100 characters6 | | | | | |

我想通過品牌這樣的合併審查柱：

| Brand | review   | favorite  | c4 | c5 | c6 | c7 | c8 | 
| brand1 | 500 characters1 | 100 characters1 | | | | | | 
| brand2 | 500 characters2 | 100 Characters2 | | | | | | 
|  | 500 characters3 | 100 Characters3 | | | | | | 
|  | 500 characters4 | 100 Characters4 | | | | | | 
| brand3 | 500 characters5 | 100 Characters5 | | | | | | 
|  | 500 characters6 | 100 characters6 | | | | | |

所以，我疲憊地使用聚合（）。

temp <- aggregate(data$review ~ data$brand , data, as.list)

但是，它需要很長時間。

有沒有簡單的方法來合併？提前謝謝！

來源

2015-10-14 liveinfootball

你可以添加一個你想要的結果的小例子嗎？我無法想象它（其他列會發生什麼？）。另外，您可能會考慮將標題/標籤更改爲更一般的內容。你的問題似乎是關於數據操縱的問題，而不是文本挖掘或語料庫特有的問題。 – aosmith

嘗試在每個因素上分割它們，然後將它們粘貼在一起。 aggregate()是一個非常慢的函數，應該避免除了最小的數據集之外的所有數據集。如果你想在他們需要的不僅是品牌層面來改變其他變量帶來（注意我下載了谷歌文件sampleDF.csv這裏）

sampleDF <- read.csv("~/Downloads/sampleDF.csv", stringsAsFactors = FALSE) 

# aggregate text by brand 
brand.split <- split(sampleDF$text, as.factor(sampleDF$Brand)) 
brand.grouped <- sapply(brand.split, paste, collapse = " ") 

# aggregate favorite by brand 
favorite.split <- split(sampleDF$favorite, as.factor(sampleDF$Brand)) 
favorite.grouped <- sapply(favorite.split, paste, collapse = " ") 

newDf <- data.frame(brand = names(brand.split), 
        text <- favorite.grouped, 
        favorite <- favorite.grouped, 
        stringsAsFactors = FALSE)

：

這應該做的伎倆。

來源

2015-10-14 22:48:04

非常感謝Ken！ – liveinfootball

如何在R中製作多個語料庫

回答

相關問題