在Linux中，tm和Snowball包命令速度較慢

我在R中使用tm和Snowball包進行文本挖掘。我最初在裝有8 GB內存的Windows 7筆記本電腦上運行它。後來我在帶有64 GB內存的Linux（Ubuntu）機器上嘗試了相同的操作。這兩臺機器都是64位，我也使用64位版本的R。但是，Windows具有R 3.0.0，而Linux具有R 2.14在Linux中，tm和Snowball包命令速度較慢

與Windows相比，某些命令在Linux中速度非常慢。

語料庫命令

在窗口

d <- data.frame(chatTranscripts$chatConcat) 
    ds <- DataframeSource(d) 
    t1 <- Sys.time() 
    dsc<-Corpus(ds) 
    print(Sys.time() - t1) 
    Time difference of 46.86169 secs

這發生在Windows機器上只有47秒

在Linux

t1 <- Sys.time() 
    dsc<-Corpus(ds) 
    print(Sys.time() - t1) 
    Time difference of 3.674376 mins

這周圍220秒了Linux機器

雪球詞幹

在窗口

t1 <- Sys.time() 
    dsc <- tm_map(dsc,stemDocument) 
    print(Sys.time() - t1) 
    Time difference of 12.05321 secs

這僅用了12秒在Windows機器上

在Linux

t1 <- Sys.time() 
    dsc <- tm_map(dsc,stemDocument) 
    print(Sys.time() - t1) 
    Time difference of 4.832964 mins

這前後花了290秒 Linux機器

有沒有一種方法，以加快Linux機器上這些命令嗎？ R版本會有如此大的差異。謝謝。

拉維

來源

2014-02-12 Ravi

這是可能的R版本可以有所作爲。由於Tim Hesterberg的工作成果，R在v2.15.1中處理數據幀的方式有了很大的性能改進。請參閱http://blog.revolutionanalytics.com/2012/06/r-2151-dataframe-package.html – Andrie

Corpus()上VectorSource()似乎快於Corpus()上DataframeSource()。

您可以嘗試

d <- chatTranscripts$chatConcat 
ds <- VectorSource(d) 
Corpus(ds)

來源

2014-08-06 17:54:05 Abir

在Linux中，tm和Snowball包命令速度較慢

回答

相關問題