2010-08-15 93 views
0

的Gnuplot允許三維數據集,它們是一組由空行分隔的表,例如:閱讀3維數據集成R

54.32,16.17,7.42,4.28,3.09,2.11,1.66,1.22,0.99,0.82,7.9 

54.63,15.50,8.53,5.31,3.75,1.66,1.14,0.83,0.94,0.52,7.18 
56.49,16.67,6.38,3.69,2.80,1.45,1.12,0.89,1.12,0.89,8.50 
56.35,16.26,7.76,3.57,2.62,1.89,1.05,1.15,0.63,1.05,7.66 

53.79,16.19,6.47,4.57,3.47,1.74,1.95,1.37,1.00,0.74,8.73 
55.63,16.28,7.87,3.72,2.48,1.99,1.40,1.19,0.70,1.08,7.65 
54.09,15.76,7.96,4.70,2.77,2.21,1.27,1.27,0.66,1.11,8.19 
53.79,16.19,6.47,4.57,3.47,1.74,1.95,1.37,1.00,0.74,8.73 

... 

這例如顯示一個數據集演變通,爲實例,時間。在Gnuplot中,您可以選擇要用於給定繪圖的數據集(使用它的索引和關鍵字,huh,index IIRC)。

我一直在使用R,到目前爲止,我一直使用scan/table函數一次一個地手動輸入數據集。我沒有一個包含所有數據集的大文件,而是每個數據集都有一個文件,我一次創建一個表。

是否有一個(內置,或非常簡單)的方式來讀取數據集中彙總全部一次,以這樣的方式,我將不得不

dataset <- neatInput("my-aggregate-data") 
dataset[1] # first data set 
dataset[2] # second data set 
... 

或類似的東西?

回答

2

我設法代碼整合成兩行,FWIW :)

check <- read.csv("data.csv", blank.lines.skip = F, head = F) 

split(check, (cumsum(is.na(check[,1]))+1) * !is.na(check[,1])) 
## $`0` 
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 
## 2 NA NA NA NA NA NA NA NA NA NA NA 
## 6 NA NA NA NA NA NA NA NA NA NA NA 

## $`1` 
##  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 
## 1 54.32 16.17 7.42 4.28 3.09 2.11 1.66 1.22 0.99 0.82 7.9 

## $`2` 
##  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 
## 3 54.63 15.50 8.53 5.31 3.75 1.66 1.14 0.83 0.94 0.52 7.18 
## 4 56.49 16.67 6.38 3.69 2.80 1.45 1.12 0.89 1.12 0.89 8.50 
## 5 56.35 16.26 7.76 3.57 2.62 1.89 1.05 1.15 0.63 1.05 7.66 

## $`3` 
##  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 
## 7 53.79 16.19 6.47 4.57 3.47 1.74 1.95 1.37 1.00 0.74 8.73 
## 8 55.63 16.28 7.87 3.72 2.48 1.99 1.40 1.19 0.70 1.08 7.65 
## 9 54.09 15.76 7.96 4.70 2.77 2.21 1.27 1.27 0.66 1.11 8.19 
## 10 53.79 16.19 6.47 4.57 3.47 1.74 1.95 1.37 1.00 0.74 8.73 
0

如果你的第三個維度是時間,那麼通常最好有專門的時間/日期的對象打交道。 R中最常用的通用時間序列軟件包包含自定義函數來執行您想要的操作。例如,要在幾個月到幾年內彙總一些數據:

> data(AirPassengers); AP = AirPassengers 
> # import the package xts, which will 'auto-import' its sole dependency, 
> # the package 'zoo' 
> library(xts)  

# AP is an R time series whose data points are in months 
> class(AP) 
[1] "ts" 
> start(AP) 
[1] 1949 1 
> end(AP) 
[1] 1960 12 
> frequency(AP) 
[1] 12 
> AP[1:3] 
[1] 112 118 132 

> # step 1: convert ts object to an xts object 
> X = as.xts(AP) 
> class(X) 
[1] "xts" "zoo" 
> # step 2: create index of endpoints to pass to the aggregator function 
> np = endpoints(X, on="years") 
> # step 3: call the aggregator function 
> X2 = period.apply(X, INDEX=np, FUN=sum) 
> X2[1:3] 
     [,1] 
Dec 1949 1520 
Dec 1950 1676 
Dec 1951 2042 
> # 'X2' is in years (each value is about 12X higher than the first three values for 
> # AP above