我有兩種載體,「速度」和「ID」它看起來像這樣一個簡單的數據幀:子集具有的功能的矢量的每一個級別,並返回一個新的數據幀(在R)
mydata
ID Speed
1 1 6.031847
2 1 7.050654
3 1 7.769475
4 1 8.838968
5 1 9.956571
6 1 11.146864
7 1 11.967616
8 1 13.078422
9 1 14.214301
10 1 14.974159
11 2 16.048627
12 2 17.070484
.. . .........
我想使與速度值的前20%的數據幀的一個子集:
subset0.20<-subset(mydata, Speed > quantile(Speed, prob = 1 - 20/100, na.rm=T))
但我不希望它爲整個數據集,因爲這會回到我不等量的值的每個ID 。
因此,必須爲每個ID計算前20%的值,然後將每個ID的結果合併到一個新的數據幀中。然後,該數據幀將包括8行(這是我的原始數據集的20%,其中有40個行)
所以我做了一些咬甲癖的掏出一些頭髮,並試圖「for循環」,如:
for(i in 1:length(ID)){
subset0.80<-subset(mydata[i], GForce > quantile(Speed, prob = 1 - 20/100, na.rm=T))
}
之類的東西適用於:
apply(mydata$Speed, 1 ,function(x) (subset(x > quantile(Speed, prob = 1 - 20/100, na.rm=T))))
但我只是沒有經驗有R得到它的工作..任何人都可以幫助我,並給我解釋一切,我做錯了什麼事情?
dput(mydata)
structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4), Speed = c(6.03184705225504, 7.05065401832249,
7.76947483668907, 8.83896842017956, 9.95657139135043, 11.1468640558647,
11.9676155772803, 13.0784218506988, 14.2143010441769, 14.9741594881612,
16.0486271520862, 17.0704843261466, 17.9324808839116, 19.1169673939822,
20.0528330256269, 20.9320440815571, 22.0379467007031, 22.962355355126,
24.0764744246649, 25.1182530133201, 26.0456043859692, 26.9528777031822,
27.9414746553538, 29.129640434174, 29.9443040639644, 30.9226103003052,
31.9932286699133, 32.9925644101585, 33.9930708538141, 35.0124438238874,
35.9215486087666, 36.9015465999988, 38.1044534443389, 39.0368063088987,
40.272189714015, 40.8993100278334, 41.9790311160737, 43.1027190745506,
43.8575622361406, 45.0499599122387)), .Names = c("ID", "Speed"
), row.names = c(NA, -40L), class = "data.frame")
某處我看到有人提到,通常是'split' + 'lapply'通常使用'by'來縮小。 +1 – A5C1D2H2I1M1N2O1R2T1 2013-04-11 09:49:23
+1,haa。幾乎與我的相同:) – 2013-04-11 09:51:33
@agstudy:並不意味着按時間順序排列。 :) – 2013-04-11 09:59:24