R從文件大小不同的數組中讀取

我需要將R中的Mann Kendall趨勢測試應用於大數量（大約100萬）不同大小的時間序列。我已經創建了一個腳本，它從某個目錄中的所有文件中獲取時間序列（實際上是一個數字列表），然後將結果輸出到.txt文件。R從文件大小不同的數組中讀取

問題是我有大約100萬的時間序列，所以創建100萬個文件並不完全好。所以我認爲把所有時間序列放在一個.txt文件中（例如用「＃」之類的符號分隔）可能更易於管理。所以我有一個這樣的文件：

我想知道，是否有可能R中提取（在兩個「＃」）等系列，然後應用分析？

編輯

繼@acesnap提示我正在使用此代碼：

library(Kendall) 
a=read.table("to_r.txt") 
numData=1017135 

for (i in 1:numData){ 

s1=subset(a,a$V1==i) 
m=MannKendall(s1$V2) 
cat(m[[1]]," ",m[[2]], " ", m[[3]]," ",m[[4]]," ", m[[5]], "\n" , file="monotonic_trend_checking.txt",append=TRUE) 
}

這種方法的作品，但問題是，它正在爲年齡計算。你能建議一個更快的方法嗎？

來源

2011-12-07 markusian

如果您有新問題，最好的做法是重新發佈一個新問題。特別是因爲已經有一個被接受的答案。 –

@PaulHiemstra我會按照你的提示 – markusian

這是否可以加快取決於瓶頸是什麼。如果是循環，你可以看看data.table包中的data.table。如果是MannKendall測試，那麼加速可能會更困難。 –

如果您要在數據集進入較大文件時進行編號，它會使事情變得更容易。如果你這樣做，你可以使用for循環和子集。

setNum  data 
    1   1 
    1   2 
    1   4 
    1   5 
    1   4 
    2   2 
    2   13 
    2   34 
...   ...

然後做這樣的事情：

answers1 <- c() 
numOfDataSets <- 1000000 
for(i in 1:numOfDataSets){ 
    ss1 <- subset(bigData, bigData$setNum == i) ## creates subset of each data set 
    ans1 <- mannKendallTrendTest(ss1$data)  ## gets answer from test 
    answers1 <- c(answers1, ans1)    ## inserts answer into vector 
    print(paste(i, " | ", ans1, "",sep="")) ## prints which data set is in use 
    flush.console()        ## prints to console now instead of waiting 
}

來源

2011-12-07 20:59:21 screechOwl

我跟着你的提示，我正在使用我張貼的代碼。問題在於它太慢了。你能提出其他建議嗎？ – markusian

這裏是一個或許更優雅的解決方案：

# Read in your data 
x=c('1','2','3','4','5','#','4','5','5','6','#','3','6','23','#') 
# Build a list of indices where you want to split by: 
ind=c(0,which(x=='#')) 
# Use those indices split the vector into a list 
lapply(seq(length(ind)-1),function (y) as.numeric(x[(ind[y]+1):(ind[y+1]-1)]))

請注意，此代碼工作，你必須有一個 '＃'字符在文件的最後。

來源

2011-12-07 23:12:19 nograpes

R從文件大小不同的數組中讀取

回答

相關問題