2016-04-06 96 views
1

我試圖使用下面的命令讀取位於here的數據的YAML文件格式,但兩者都沒有以所需的輸出格式提供數據,如位於here的CSV文件。 YAML文件中的數據描述爲here或者很快,您可以直接參考最後給出的格式。作爲數據幀讀取YAML文件時出錯R

我試着用這些命令加載數據,但徒勞無功。任何人都可以請指導我正確加載YAML文件中的數據作爲R數據名稱或根據上面指定的輸出格式轉換爲csv?

cric <- yaml.load_file("911047.yaml") 
cric <- data.frame(yaml.load_file("211028.yaml")) 

我給下面的數據的高層次格式爲您快速REF(對不起,原來的YAML代碼格式消失了,而在這裏粘貼,我無法想出一個辦法來粘貼,並原樣重新格式化) :

meta: 
    data_version: 0.6 
    created: 2013-02-22 
    revision: 1 
    info: 
    city: Southampton 
    dates: 
    - 2005-06-13 
    match_type: T20 
    outcome: 
    by: 
     runs: 100 
    winner: England 
    overs: 20 
    player_of_match: 
     - KP Pietersen 
    teams: 
     - England 
     - Australia 
    toss: 
     decision: bat 
     winner: England 
    umpires: 
    - NJ Llong 
    - JW Lloyds 
    venue: The Rose Bowl innings: 
    - 1st innings: 
     team: England 
     deliveries: 
     - 0.1: 
      batsman: ME Trescothick 
      bowler: B Lee 
      non_striker: GO Jones 
      runs: 
       batsman: 0 
       extras: 0 
       total: 0 
+1

由於數據沒有自然的矩形結構,您無法將其快速轉換爲data.frame。你將不得不編寫一個自定義的解析函數來將它轉換爲一個向量,然後將結果一起「rbind()」。 – Thomas

回答

1

可通過熔融從包reshape2來解決

下面的代碼將有助於

library(reshape2) 
library(reshape2) 
data = yaml.load_file("C:\\Users\\vsahu\\Downloads\\mdms\\911047.yaml") 
x = melt(data) 
y = data.frame(x) 

meta = y[y$L1 == 'meta',] 
meta = meta[, colSums(is.na(meta)) != nrow(meta)] 
data_meta = reshape(meta,direction = 'wide',timevar = 'L2',idvar = 'L1') 

info = y[y$L1 == 'info',] 
info = info[, colSums(is.na(info)) != nrow(info)] 
info = subset(data_innings, select=-c(L1)) 


data_innings = y[(y$L1 == 'innings') & (y$L4 == 'deliveries'),] 
data_innings$new = paste(data_innings$L7,data_innings$L8,sep="_") 
data_innings = subset(data_innings, select=-c(L7,L8,L4,L1,L5)) 
data_innings = reshape(data_innings,idvar=c('L2','L3','L6'),direction = "wide",timevar = c('new')) 
write.csv(data_innings,"data_innings.csv",row.names = F) 
0

我已經編輯了Vaibhav的上面的答案來創建一個讀取指定目錄中的所有yaml文件並將其轉換爲csv的函數。它處理由重塑造成的多行匹配錯誤。

aggr_fielder <- function(x) { 
paste0(x, collapse="/") 
} 

convertCricsheetData <- function(source = ".",destination = ""){ 
require(yaml) 
require(reshape2) 
require(data.table) 
all.files <- list.files(path = source, 
         pattern = ".yaml", 
         full.names = TRUE) 

for (i in 1:length(all.files)) { 
    data = yaml.load_file(all.files[i]) 
    x = melt(data) 
    y = data.table(x) 

    meta = y[y$L1 == 'meta',] 
    meta = meta[, colSums(is.na(meta)) != nrow(meta), with=FALSE] 
    data_meta = reshape(meta,direction = 'wide',timevar = 'L2',idvar = 'L1') 

    info = y[y$L1 == 'info',] 
    info = info[, colSums(is.na(info)) != nrow(info), with=FALSE] 
    info[, L1 := NULL] 
    info[,match_no := i] 

    data_innings = y[(y$L1 == 'innings') & (y$L4 == 'deliveries'),] 
    data_innings[, new := paste(data_innings$L7,data_innings$L8,sep="_")] 
    data_innings [, c("L7","L8","L4","L1","L5") := NULL] 
    data_innings = dcast(data_innings, L2+L3+L6 ~ new, fun.aggregate = aggr_fielder,fill = NA) 
    data_innings[,match_no := i] 
    write.csv(data_innings,paste0(destination,paste(c(info[info$L2 == "dates",]$value,info[info$L2 == "teams",]$value), collapse = "-"),".csv"),row.names = F) 
    write.csv(info,paste0(destination,paste(c("info",info[info$L2 == "dates",]$value,info[info$L2 == "teams",]$value), collapse = "-"),".csv"),row.names = F) 
    } 
}