2017-05-02 102 views
1

我試圖對變量'Size'運行一個Shapiro Wilks測試,使用一個數據集,我使用ddply進行子集化(通過變量'Site'和'類別'),但我不斷收到錯誤消息。Shapiro.test&plyr:所有的'x'值是相同的

下面是我的數據集(d)的示例。我有4237周的觀察與9類和第13位:

Site Genus Size Category 
Arn01 ACR  4  ACR 
Arn01 ACR  7  ACR 
Arn02 ACR  3  ACR 

我創建了夏皮羅威爾克斯功能:

shap.w <- function(input){ #shapiro wilk test function 
    if(sum(!is.na(input$Size)) > 3 & sum(!is.na(input$Size)) < 5000){ 
     p <- shapiro.test(input$Size)$p.value 
     return(p)}else{return(NA)} } 

然後,我嘗試使用ddply到功能應用到我的數據子集:

sw_test <- ddply(d, .(Site, Category), .fun = shap.w) 

但是,當我這樣做,我得到一個錯誤,指出消息:

Error in shapiro.test(input$Size) : all 'x' values are identical 

即使他們顯然不是。任何幫助/建議將不勝感激。的

dput(d[1:20,]):

ETA輸出的
> dput(d[1:20,]) 
structure(list(Site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Arn01n", 
"Arn02n", "Arn03n", "Arn04n", "Arn05n", "Arn06n", "Arn07n", "Arn08n", 
"Arn09n", "Arn10n", "Arn11n", "Arn12n", "Arn13n"), class = "factor"), 
Genus = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 30L, 30L, 30L, 30L), .Label = c("ACA", 
"ACR", "AST", "COS", "CYP", "ECH", "FUN", "FVA", "FVT", "GAR", 
"GON", "HEL", "HYD", "ISO", "LEA", "LEO", "LEP", "LOB", "MER", 
"MNT", "MST", "MYC", "PAV", "PBR", "PLA", "PLAT", "POC", 
"POD", "PRE", "PRM", "PRS", "PSA", "SAR", "STY"), class = "factor"), 
Size = c(4, 2, 4, 4, 3, 5, 5, 4, 4, 4, 4, 3, 6, 3, 4, 5, 
2, 3, 3, 6), Category = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 8L), .Label = c("ACR", 
"FAV", "FUN", "HEL", "ISO", "MNT", "POC", "PRM", "PRS"), class = "factor")), 
.Names = c("Site", 
"Genus", "Size", "Category"), row.names = c(NA, 20L), class = "data.frame")` 

table(d$Size)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25 26 27 28 29 30 31 33 35 36 37 38 39 14 271 525 548 521 424 201 206 50 357 23 95 36 7 171 11 14 30 4 145 11 21 5 46 4 1 5 1 95 1 2 31 3 1 2 1 40 41 42 43 44 45 46 48 50 51 53 55 56 57 60 62 63 65 66 70 72 75 76 80 82 83 85 88 90 94 95 100 105 110 120 125 80 1 9 3 4 22 1 4 42 1 1 4 1 3 64 3 5 9 4 13 1 2 1 20 2 2 2 1 5 1 2 17 1 2 6 2 128 130 143 150 155 160 180 200 230 300 890 920 1 1 1 1 1 1 1 2 1 1 1 1

+0

評論不適用於擴展討論;這個談話已經[轉移到聊天](http://chat.stackoverflow.com/rooms/143360/discussion-on-question-by-ecologist-shapiro-test-plyr-all-x-values-are-iden) 。 –

回答

0

好的,由於我在評論中收到的幫助,我能夠通過更新爲功能碼爲:

shap.w <- function(input){   #shapiro-wilks test function 
if(length(unique((input$Size[!is.na(input)]))) > 3 
& length(unique((input$Size[!is.na(input)])))< 5000){ 
p <- shapiro.test(input$Size)$p.value 
return(p)}else{return(NA)} } 

此刪除是大於5000小於3 /組合(雖然我不會有任何大於5,000在這個數據集)。一旦我更新了這個,下一行沒有任何問題。感謝大家的幫助!

1

注噸


ETA輸出如果你退回NA,那麼is.numeric會給FALSE:試試is.numeric(NA)看到這個。

你可以返回NA_real_,而不是

is.numeric(NA) 
[1] FALSE 
is.numeric(NA_real_) 
[1] TRUE 

它仍然是一個NA雖然:

is.na(NA_real_) 
[1] TRUE 

然而,as.numeric也應該解決這個問題(也許雙重檢查什麼正在返回給出你的函數ddply輸入)

+0

謝謝你,@Glen_b!原來我在原代碼中使用了不正確的函數。當我更新函數時,我得到一個新的錯誤消息,說'shapiro.test中的錯誤(輸入$ Size):所有'x'值是相同的'。我在網站上看到過一些類似的問題,但似乎沒有解決我的問題。我會很感激你的任何想法! – ecologist

相關問題