NA列之間的相關性

-1

我必須編寫一個函數，其中包含數據文件的目錄和完整案例的閾值，並計算每個文件中硫酸鹽和硝酸鹽（兩列）之間的相關性，其中完全觀察到的病例數量所有變量）大於閾值。該函數應返回滿足閾值要求的監視器的相關向量。如果沒有文件符合閾限要求，則函數應該返回長度的數字矢量0這個函數的原型如下NA列之間的相關性

我的代碼看起來像這樣

corr <- function(directory,threshold=0){ 
    a<-list.files("specdata") 
    for (i in a) { 
     data <- read.csv(paste(directory, "/", i, sep ="")) 
     x<-complete.cases(data) 
     j<-sum(as.numeric(x)) 
     sulfate<-data[,2] 
     nitrate<-data[,3] 
     b<-cor(sulfate,nitrate) 
    } 
    if (j>threshold) 
     return(b) 
    else 
     numeric() 
}

沒有錯誤messege

如果鍵入

ž< -corr（「specdata」）

頭（z） [1]不適用

我不知道問題是什麼。我不知道列中的NA值是否與它有關。我認爲我的代碼中缺少一些東西。我認爲read.csv在每個文件需要一個數據幀時創建一個唯一的數據幀，但我不明白爲什麼在這種情況下返回值是NA（當沒有閾值時）。

但是，如果我介紹一個更大的閾值（1000）：

z<-corr("specdata",1000) 
head(z) 
numeric(0)

預期輸出我需要的是

cr <- corr("specdata", 150) 
head(cr) 
[1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814

來源

2014-01-20 Matias Andina

'硫酸<-data [X，2];硝酸鹽<-data [x，3]' – Roland

好吧，現在它似乎正在工作，但不是預期的輸出。難道是這些文件未被正確加載？我的csv文件在001，002，003，...，但我沒有使用sprintf（％03d），因爲我使用了list.files，而這似乎工作。 –

您在循環中覆蓋b。 –

這個問題將可能是最好被分成兩個步驟 - 計算每個文件的值並收集所有文件的結果。

corr.file <- function(filename) { 
    data <- read.csv(paste(directory, "/", i, sep ="")) 
    x <- complete.cases(data) 
    sulfate <- data[,2] 
    nitrate <- data[,3] 
    b <- cor(sulfate,nitrate) 
    if (j>threshold) return(b) else return(numeric()) 
} 

a <- list.files("specdata") 
correlations <- sapply(a, corr.file)

來源

2014-01-20 19:05:11 josliber

this is the correct and running solution you can refer to this 

corr <- function(directory, threshold = 0) { 
    ## 'directory' is a character vector of length 1 indicating the location of 
    ## the CSV files 

    ## 'threshold' is a numeric vector of length 1 indicating the number of 
    ## completely observed observations (on all variables) required to compute 
    ## the correlation between nitrate and sulfate; the default is 0 

    ## Return a numeric vector of correlations 
    df = complete(directory) 
    ids = df[df["nobs"] > threshold, ]$id 
    corrr = numeric() 
    for (i in ids) { 

    newRead = read.csv(paste(directory, "/", formatC(i, width = 3, flag = "0"), 
          ".csv", sep = "")) 
    dff = newRead[complete.cases(newRead), ] 
    corrr = c(corrr, cor(dff$sulfate, dff$nitrate)) 
    } 
    return(corrr) 
} 
complete <- function(directory, id = 1:332) { 
    f <- function(i) { 
    data = read.csv(paste(directory, "/", formatC(i, width = 3, flag = "0"), 
          ".csv", sep = "")) 
    sum(complete.cases(data)) 
    } 
    nobs = sapply(id, f) 
    return(data.frame(id, nobs)) 
} 
cr <- corr("specdata", 150) 
head(cr)

來源

2014-12-07 12:51:15 Hanish

我建議不要將答案發布到課程作業 – alexsuslin

NA列之間的相關性

回答

相關問題