如何在r中輸出正確格式的數據幀？

我必須編寫一個函數來讀取一個完整的文件目錄，並報告每個數據文件中完全觀察到的情況的數量（每個可觀察實例中沒有NA值）。該函數應該返回一個數據框，其中第一列是文件的名稱，第二列是完整案例的編號。請參閱下面的草稿，希望評論有幫助！如何在r中輸出正確格式的數據幀？

complete <- function (directory, id = 1:332){ 
    nobs = numeric() #currently blank 
    # nobs is the number of complete cases in each file 
    data = data.frame() #currently blank dataframe 
    for (i in id){ 
    #get the right filepath 
    newread = read.csv(paste(directory,"/",formatC(i,width=3,flag="0"),".csv",sep="")) 
    my_na <- is.na(newread) #let my_na be the logic vector of true and false na values 
    nobs = sum(!my_na) #sum up all the not na values (1 is not na, 0 is na, due to inversion). 
    #this returns # of true values 
    #add on to the existing dataframe 
    data = c(data, i, nobs, row.names=i) 
    } 
    data # return the updated data frame for the specified id range 
}

樣品運行complete("specdata",1)的輸出是

[[1]] 
[1] 1 

[[2]] 
[1] 3161 

$row.names 
[1] 1

我不知道爲什麼它沒有在常規數據幀格式顯示。另外我很確定我的數字也不正確。我正在假設在每個實例中，newread會在繼續執行my_na之前讀取該文件中的所有數據。這是錯誤的來源嗎？或者是別的什麼？請解釋。謝謝！

來源

2016-08-27 shoestringfries

看起來像你在做Coursera HW ... – Nate

在你的'for'循環中，你正在分配'data'（覆蓋它）。 – steveb

第1周已經到期了嗎？：）祝你好運。我從這門課學到了很多東西。 –

您應該考慮將其他值添加到矢量的其他方法。該功能目前正在覆蓋整個地方。你詢問了id = 1時，當你給函數提供多個id時會更糟糕。它只會返回最後一個。這是爲什麼：

#Simple function that takes ids and adds 2 to them 
myFun <- function(id) { 

    nobs = c() 

    for(i in id) { 

    nobs = 2 + i 
    } 

    return(nobs) 
} 

myFun(c(2,3,4)) 
[1] 6

我告訴它爲每個id返回值加2，但它只給了我最後一個。我應該這樣寫：

myFun2 <- function(id) { 

    nobs = c() 

    for(i in 1:length(id)) { 

    nobs[i] <- 2 + id[i] 
    } 

    return(nobs) 
} 

myFun2(c(2,3,4)) 
[1] 4 5 6

現在它給出正確的輸出。有什麼不同？首先nobs對象不會被覆蓋，它被追加。請注意for循環標題中的子集括號和新計數器。

此外，建造對象不使用R.它最好的辦法是內置了可事半功倍：

complete <- function(directory, id=1:332) { 
    nobs <- sapply(id, function(i) { 
    sum(complete.cases(read.csv(list.files(path=directory, full.names=TRUE)[i]))) }) 
    data.frame(id, nobs) 
}

如果你想解決您的代碼，你可以試試：

complete <- function (directory, id = 1:332){ 
    nobs = numeric(length(id)) #currently blank 
    # nobs is the number of complete cases in each file 
    for (i in 1:length(id)) { 
    #get the right filepath 
    newread = read.csv(paste(directory,"/",formatC(id[i] ,width=3,flag="0"),".csv",sep="")) 
    my_na <- is.na(newread) #let my_na be the logic vector of true and false na values 
    nobs[i] = sum(!my_na) #sum up all the not na values (1 is not na, 0 is na, due to inversion). 
    #this returns # of true values 
    } 
    data.frame(id, nobs) # return the updated data frame for the specified id range 
}

來源

2016-08-27 04:56:42

由於我不知道你指的是什麼數據，並且由於沒有給定的樣本，我能想出這個作爲一個編輯給你的函數 -

complete <- function (directory, id = 1:332){ 
    data = data.frame() 
    for (i in id){ 
    newread = read.csv(paste(directory,"/",formatC(i,width=3,flag="0"),".csv",sep="")) 
    newread = newread[complete.cases(newread),] 
    nobs = nrow(newread) 
    data[nrow(data)+1,] = c(i,nobs) 
    } 
    names(data) <- c("Name","NotNA") 
    return(data) 
}

來源

2016-08-27 11:31:50 prateek1592

如何在r中輸出正確格式的數據幀？

回答

相關問題