我對R和統計數據非常陌生,並且無法使tapply()正常工作。我有一個有15列和數千行的數據框。我用y1<-((x>0)&(x<=5))
等類似的東西做了一堆邏輯向量,其中x是數據幀中的列名。然後將這些邏輯向量組合並使用因子()將其轉換爲分組因子。一切看起來都可以正常工作。分組因子,數據框和tapply問題
問題是,當我嘗試使用tapply()與tapply(dataframe, group, sample, size=20)
其中group
是分組因子時,我得到錯誤:'參數必須具有相同的長度'。當我嘗試length(dataframe)
時,我得到數據框中的列數(僅15),而length(group)
返回行數(數千)。我在創建邏輯向量和分組因子方面有錯誤嗎?
下面是來自dput()作爲Maxim.K建議的輸出:(對不起,這不是很整齊)
structure(list(Lat = c(-90L, -90L, -90L, -90L, -90L, -90L, -90L,
-90L, -90L, -90L, -90L, -90L, -90L, -90L, -90L), Lon = -180:-166,
Jan = c(2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79, 2.79,
2.79, 2.79, 2.79, 2.79, 2.79, 2.79), Feb = c(2.35, 2.35,
2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35, 2.35,
2.35, 2.35, 2.35), Mar = c(0.49, 0.49, 0.49, 0.49, 0.49,
0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49, 0.49
), Apr = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
May = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jun = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Jul = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Aug = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Sep = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Oct = c(1.75, 1.75, 1.75,
1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75, 1.75,
1.75, 1.75), Nov = c(2.77, 2.77, 2.77, 2.77, 2.77, 2.77,
2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77, 2.77), Dec = c(2.65,
2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65, 2.65,
2.65, 2.65, 2.65, 2.65), Ann = c(1.07, 1.07, 1.07, 1.07,
1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07, 1.07,
1.07)), .Names = c("Lat", "Lon", "Jan", "Feb", "Mar", "Apr",
"May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Ann"
), row.names = c(NA, 15L), class = "data.frame")
而對於羣:從頭部
15值(從dput() )
structure(c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")
...並從尾部
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor")
我試圖從tapply()(大小爲20)的所有8個類別中隨機抽取樣本。
完全不出所料,問題不在於問題和要求,而在於我的理解。我誤解了這個問題;實際上,我只是應該從一列中抽樣,而不是從整個數據框中抽樣。
如果您提供了一些示例數據,問題會更容易回答。使用'dput(head(yourdata,15))'或某種程度可能會有所幫助。 – 2013-04-22 10:59:45
爲了進行比較,您可能需要使用'nrow(dataframe)',它給出了行數,而不是'length(dataframe)',它給出了列數。 – Roland 2013-04-22 11:04:02
謝謝,我剛剛嘗試過,並返回正確的行數(即數據框中的行數與分組因子中的行數相同)。 – 2013-04-22 11:08:56