2012-11-19 86 views
0

現在我有下面的R代碼。它讀取,看起來像這樣的數據:彙總來自多個輸入文件的輸出R

track_id day hour month year rate gate_id pres_inter vmax_inter 
9 10 0 7 1 9.6451E-06 2 97809 23.545 
9 10 0 7 1 9.6451E-06 17 100170 13.843 
10 3 6 7 1 9.6451E-06 2 96662 31.568 
13 22 12 8 1 9.6451E-06 1 94449 48.466 
13 22 12 8 1 9.6451E-06 17 96749 30.55 
16 13 0 8 1 9.6451E-06 4 98702 19.205 
16 13 0 8 1 9.6451E-06 16 98585 18.143 
19 27 6 9 1 9.6451E-06 9 98838 20.053 
19 27 6 9 1 9.6451E-06 17 99221 17.677 
30 13 12 6 2 9.6451E-06 2 97876 27.687 
30 13 12 6 2 9.6451E-06 16 99842 18.163 
32 20 18 6 2 9.6451E-06 1 99307 17.527 


################################################################## 
# Input/Output variables 
################################################################## 
for (N in (59:96)){ 
    if (N < 10){ 
#  TrackID <- "000$N" 
    TrackID <- paste("000",N, sep="") 
    } 
    else{ 
#  TrackID <- "00$N" 
    TrackID <- paste("00",N, sep="") 
    } 
    print(TrackID) 

# For 2010_08_24 trackset 
# fname_in <- paste('input/2010_08_24/intersections_track_calibrated_jma_from1951_',TrackID,'.csv', sep="") 
# fname_out <- paste('output/2010_08_24/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="") 
# For 2012_05_01 trackset 
    fname_in <- paste('input/2012_05_01/intersections_track_param_',TrackID,'.csv', sep="") 
    fname_out <- paste('output/2012_05_01/tracks_crossing_regional_polygon_',TrackID,'.csv', sep="") 
    fname_out2 <- paste('output/2012_05_01/GateID_',TrackID,'.csv', sep="") 

####################################################################### 
# we read the gate crossing track date 
    cat('reading the crosstat output file', fname_in, '\n') 
    header <- read.table(fname_in, nrows=1) 
    track <- read.table(fname_in, sep=',', skip=1) 
    colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter") 

# track_id=track[,1] 
# pres_inter=track[,15] 

# Function to select maximum surge by stormID 
    ByTrack <- ddply(track, "ID", function(x) x[which.max(x$vmax_inter),]) 
    ByGate <- count(track, vars="gate_id") 

# Write the output file with a single record per storm      
    cat('Writing the full output file', fname_out, '\n') 
    write.table(ByTrack, fname_out, col.names=T, row.names=F, sep = ',') 

# Write the output file with a single record per storm      
    cat('Writing the full output file', fname_out2, '\n') 
    write.table(ByGate, fname_out2, col.names=T, row.names=F, sep = ',') 
} 

我給的代碼的最後段輸出是GATEID的基團的文件,並輸出發生的頻率。它看起來像這樣:

gate_id freq 
1 935 
2 2096 
3 1363 
4 963 
5 167 
6 17 
7 43 
8 62 
9 208 
10 267 
11 64 
12 162 
13 178 
14 632 
15 807 
16 2003 
17 838 
18 293 

的事情是,我輸出看起來就像這樣96個不同的輸入文件的文件。我不想輸出96個單獨的文件,而是要計算每個輸入文件的這些聚合,然後對所有96個輸入的頻率求和並打印出一個SINGLE輸出文件。誰能幫忙?

感謝, ķ

回答

1

你將需要做類似下面的功能。這將抓住一個目錄中的所有.csv文件,以便該目錄必須僅包含要在其中分析的文件。

myFun <- function(out.file = "mydata") { 
files <- list.files(pattern = "\\.(csv|CSV)$") 
# Use this next line if you are going use the file name as a variable/output etc 
files.noext <- substr(basename(files), 1, nchar(basename(files)) - 4) 

for (i in 1:length(files)) { 
    temp <- read.csv(files[i], header = FALSE) 
    # YOUR CODE HERE 
    # Use the code you have already written but operate on files[i] or temp 
    # Save the important stuff into one data frame that grows 
    # Think carefully ahead of time what structure makes the most sense 
    } 

datafile <- paste(out.file, ".csv", sep = "") 
write.csv(yourDataFrame, file = datafile) 
} 
+0

謝謝 - 我明天就要開始工作了!我很欣賞時間。 – kimmyjo221