如何處理累積百分比頻率圖R中的數據

我有一個包含參數值的大型數據集。多個羣集可以具有相同的值。如何處理累積百分比頻率圖R中的數據

我想做一個累計百分比頻率分佈圖，累計百分比爲no。 y軸上的集羣和x軸上的參數值（範圍從0-1）。

我已經根據這些值對數據進行了排序，但之後我不確定如何使用R（ecdf）或matplotlib來處理它以獲得累積圖。我該如何解決這個問題？任何幫助將不勝感激。

我的數據是這樣的

Cluster_20637 0.020 
Cluster_20919 0.020 
Cluster_9642 0.147 
Cluster_10141 0.148 
Cluster_21451 0.148 
Cluster_30198 0.148 
Cluster_55982 0.498 
Cluster_10883 0.500 
Cluster_16641 0.500 
Cluster_20143 0.500 
Cluster_57942 0.867 
Cluster_32878 0.868 
Cluster_26249 0.870 
Cluster_46928 0.870 
Cluster_41908 0.871 
Cluster_28603 0.872 
Cluster_1419 0.873

來源

2012-06-02 psaima

感謝joran的編輯 - 我無法弄清楚如何做格式化！ – psaima

可能的重複：http://stackoverflow.com/questions/10030547/frequency-and-cumulative-frequency-curve-on-the-same-graph-in-r/10031056#10031056 –

這裏是作爲一個data.frame數據的轉儲稱爲test：

test <- structure(list(cluster = structure(c(6L, 7L, 17L, 1L, 8L, 11L, 
15L, 2L, 4L, 5L, 16L, 12L, 9L, 14L, 13L, 10L, 3L), .Label = c("Cluster_10141", 
"Cluster_10883", "Cluster_1419", "Cluster_16641", "Cluster_20143", 
"Cluster_20637", "Cluster_20919", "Cluster_21451", "Cluster_26249", 
"Cluster_28603", "Cluster_30198", "Cluster_32878", "Cluster_41908", 
"Cluster_46928", "Cluster_55982", "Cluster_57942", "Cluster_9642" 
), class = "factor"), value = c(0.02, 0.02, 0.147, 0.148, 0.148, 
0.148, 0.498, 0.5, 0.5, 0.5, 0.867, 0.868, 0.87, 0.87, 0.871, 
0.872, 0.873)), .Names = c("cluster", "value"), row.names = c(NA, 
-17L), class = "data.frame")

它看起來像：

  cluster value 
1 Cluster_20637 0.020 
2 Cluster_20919 0.020 
3 Cluster_9642 0.147 
<<snip>> 
16 Cluster_28603 0.872 
17 Cluster_1419 0.873

生成累積百分比變量

> test$cumperc <- (1:nrow(test))/nrow(test) 
> test 

     cluster value cumperc 
1 Cluster_20637 0.020 0.05882353 
2 Cluster_20919 0.020 0.11764706 
3 Cluster_9642 0.147 0.17647059 
<<snip>> 
14 Cluster_46928 0.870 0.82352941 
15 Cluster_41908 0.871 0.88235294 
16 Cluster_28603 0.872 0.94117647 
17 Cluster_1419 0.873 1.00000000

然後繪製數據

圖（試驗$值，測試$ cumperc，類型= 「L」，XLIM = C（0,1））

enter image description here

編輯解決下面的評論：

嘗試這組集羣第一：

tabvals <- table(test$value) 
plot(names(tabvals),(1:length(tabvals))/length(tabvals),xlim=c(0,1),type="l")

哪個給出了這樣的情節：

enter image description here

來源

2012-06-02 06:19:00 thelatemail

謝謝，但我認爲這個解決方案計算每個羣集的累積百分比，對嗎？我想計算具有相同值的羣集數量（例如，在值爲0.020的羣集的測試數據編號中爲2），然後計算該羣集頻率的累計百分比並對該參數繪圖。這個怎麼做？ – psaima

@psaima - 查看我上面的編輯 – thelatemail

如何處理累積百分比頻率圖R中的數據

回答

相關問題