2017-08-09 126 views
0

有沒有這方面的一些研究,只發現在多個CSV文件讀取信息。如何讀取包含多個數據集的CSV文件?

我試圖創建窗口小部件,我可以在一個CSV讀取數據集和打印儘可能多的圖形文件,有多少數據集。

但我想,集思廣益的一個CSV與垂直輸入多個數據集閱讀方式。但是,我不知道每個數據集的長度,我不知道會有多少數據集存在。

的想法或概念來考慮,將不勝感激。

+5

您可以加載該文件作爲一個數據集,然後根據文件分隔的方式提取R中的不同數據集米 –

+1

每個數據集的分界是什麼?您似乎知道有多個數據,那麼在單個文件中是如何定義另一個數據集的? – hrbrmstr

+0

@OriolMirosa請問您如何識別分隔符並對其進行解析?我對R沒有太多經驗,但是我當前的小部件只能讀取CSV,因此我不必處理如何解析它的細節。 – Loparia

回答

2

由於@Oriol Mirosa在評論中提到的,這是你能做到這一點的方法之一。您可以先閱讀整個文檔:

df = read.csv("path", header = TRUE) 

以下假設是整個csv文件是如何構成的:

df = data.frame(X=c(1:10, "X", 1:20, "X", 1:30), 
       Y=c(1:10, "Y", 1:20, "Y", 1:30), 
       Z=c(1:10, "Z", 1:20, "Z", 1:30)) 

df$newset = ifelse(df$X == "X", 1, 0) 
df$newset = as.factor(cumsum(df$newset)) 

dfs = split(df, df$newset) 
dfs[-1] = lapply(dfs[-1], function(x) x[-1,-ncol(x)]) 
dfs[[1]] = dfs[[1]][,-ncol(dfs[[1]])] 

我創建了一個二元變量newset指示行是否是「頭」。然後,使用cumsum來填充每個「數據集」的唯一編號。然後我split()newset創建一個數據集的列表,每個元素包含一個。最後,我刪除了每個數據集的第一行,並根據需要將它們設置爲列名。無論每個數據集的長度如何,這都應該起作用。

結果:

# $`0` 
#  X Y Z 
# 1 1 1 1 
# 2 2 2 2 
# 3 3 3 3 
# 4 4 4 4 
# 5 5 5 5 
# 6 6 6 6 
# 7 7 7 7 
# 8 8 8 8 
# 9 9 9 9 
# 10 10 10 10 
# 
# $`1` 
#  X Y Z 
# 12 1 1 1 
# 13 2 2 2 
# 14 3 3 3 
# 15 4 4 4 
# 16 5 5 5 
# 17 6 6 6 
# 18 7 7 7 
# 19 8 8 8 
# 20 9 9 9 
# 21 10 10 10 
# 22 11 11 11 
# 23 12 12 12 
# 24 13 13 13 
# 25 14 14 14 
# 26 15 15 15 
# 27 16 16 16 
# 28 17 17 17 
# 29 18 18 18 
# 30 19 19 19 
# 31 20 20 20 
# 
# $`2` 
#  X Y Z 
# 33 1 1 1 
# 34 2 2 2 
# 35 3 3 3 
# 36 4 4 4 
# 37 5 5 5 
# 38 6 6 6 
# 39 7 7 7 
# 40 8 8 8 
# 41 9 9 9 
# 42 10 10 10 
# 43 11 11 11 
# 44 12 12 12 
# 45 13 13 13 
# 46 14 14 14 
# 47 15 15 15 
# 48 16 16 16 
# 49 17 17 17 
# 50 18 18 18 
# 51 19 19 19 
# 52 20 20 20 
# 53 21 21 21 
# 54 22 22 22 
# 55 23 23 23 
# 56 24 24 24 
# 57 25 25 25 
# 58 26 26 26 
# 59 27 27 27 
# 60 28 28 28 
# 61 29 29 29 
# 62 30 30 30 
2
# Create sample data 

unlink("so-data.csv") # remove it if it exists 

set.seed(1492) # reproducible 

# make 3 data frames of different lengths 
frames <- lapply(c(3, 10, 5), function(n) { 
    data.frame(X = runif(n), Y1 = runif(n), Y2= runif(n)) 
}) 

# write them to single file preserving the header 
suppressWarnings(
    invisible(
    lapply(frames, write.table, file="so-data.csv", sep=",", quote=FALSE, 
      append=TRUE, row.names=FALSE) 
) 
) 

這個文件看起來像:

"X","Y1","Y2" 
0.277646409813315,0.110495456494391,0.852662623859942 
0.21606229362078,0.0521760624833405,0.510357670951635 
0.184417578391731,0.00824321852996945,0.390395383816212 
"X","Y1","Y2" 
0.769067857181653,0.916519832098857,0.971386880846694 
0.6415081594605,0.63678711745888,0.148033464793116 
0.638599780155346,0.381162445060909,0.989824152784422 
0.194932354846969,0.132614633999765,0.845784503268078 
0.522090089507401,0.599085820373148,0.218151196138933 
0.521618122234941,0.0903550288639963,0.983936473494396 
0.792095972690731,0.932019826257601,0.703315682942048 
0.12338977586478,0.584303047973663,0.421113619813696 
0.343668724410236,0.561827397439629,0.111441049026325 
0.660837838426232,0.345943035557866,0.0270762923173606 
"X","Y1","Y2" 
0.309987690066919,0.441982284653932,0.133840701542795 
0.747786369873211,0.240106994053349,0.62044994905591 
0.789473889162764,0.853503877297044,0.150850139558315 
0.165826949058101,0.119402598123997,0.318282842403278 
0.39083837531507,0.109747459646314,0.876092307968065 

現在你可以這樣做:

# read in the data as lines 

l <- readLines("so-data.csv") 

# figure out where the individual data sets are 

starts <- which(grepl("X", l)) 
ends <- c((starts[2:length(starts)]-1), length(l)) 

# read them in 

new_frames <- mapply(function(start, end) { 
    read.csv(text=paste0(l[start:end], collapse="\n"), header=TRUE) 
}, starts, ends, SIMPLIFY=FALSE) 

str(new_frames) 
## List of 3 
## $ :'data.frame': 3 obs. of 3 variables: 
## ..$ X : num [1:3] 0.278 0.216 0.184 
## ..$ Y1: num [1:3] 0.1105 0.05218 0.00824 
## ..$ Y2: num [1:3] 0.853 0.51 0.39 
## $ :'data.frame': 10 obs. of 3 variables: 
## ..$ X : num [1:10] 0.769 0.642 0.639 0.195 0.522 ... 
## ..$ Y1: num [1:10] 0.917 0.637 0.381 0.133 0.599 ... 
## ..$ Y2: num [1:10] 0.971 0.148 0.99 0.846 0.218 ... 
## $ :'data.frame': 5 obs. of 3 variables: 
## ..$ X : num [1:5] 0.31 0.748 0.789 0.166 0.391 
## ..$ Y1: num [1:5] 0.442 0.24 0.854 0.119 0.11 
## ..$ Y2: num [1:5] 0.134 0.62 0.151 0.318 0.876 
相關問題