R - 從數據幀中剪切數據以進行平衡

我有一個數據框，其中有2600個條目，分佈在249個因子級別（人員）中。數據集不夠平衡。R - 從數據幀中剪切數據以進行平衡

我想刪除其具有小於5所中出現的一個因素所有條目。此外，我想修剪那些發生次數超過5次的事件到5次。所以最後我希望有一個數據框架，它的總體條目較少，但是它對於因素人員是平衡的。

數據集建立如下：

file_list <- list.files("path/to/image/folder", full.names=TRUE) 
# the folder contains 2600 images, which include information about the 
# person factor in their file name 

file_names <- sapply(strsplit(file_list , split = '_'), "[", 1) 
person_list <- substr(file_names, 1 ,3) 
person_class <- as.factor(person_list) 

imageWidth = 320; # uniform pixel width of all images 
imageHeight = 280; # uniform pixel height of all images 
variableCount = imageHeight * imageWidth + 2 

images <- as.data.frame(matrix(seq(count),nrow=count,ncol=variableCount)) 
images[1] <- person_class 
images[2] <- eyepos_class 

for(i in 1:count) { 
    img <- readJPEG(file_list[i]) 
    image <- c(img) 
    images[i, 3:variableCount] <- image 
}

所以基本上我需要得到每個因子水平（樣品的使用summary(images[1])時，如金額，然後進行操作來修剪數據我真的不知道該如何從這裏開始，並且不需要任何幫助

來源

2016-06-12 4ndro1d

我知道你的數據是不是很小，但爲了寫一個很好的問題，這是可重複的，這將讓你upvotes和答案，請提供可重複的，我們可以複製並粘貼以重現您的數據/問題並重現您的問題。您可以使用內置數據集或創建自己的數據集幷包含您使用的代碼。 –

那麼我盡我所能使它具有可重現性，但仍需要數據集，這是公開可用的，但下載速度很慢af – 4ndro1d

的選項使用data.table

library(data.table) 
res <- setDT(images)[, if(.N > = 5) head(.SD, 5) , by = V1]

來源

2016-06-13 02:02:29 akrun

這似乎工作（減少數據集從2639 - > 1090對象）。謝謝！ – 4ndro1d

也許你可以告訴我爲什麼'plot（res $ V1）'在那之後工作，但是'plot（res [1]）'給出了一個錯誤：'plot.new（）中的錯誤：figure margins too too'？這不應該是一樣的嗎？ – 4ndro1d

@ 4ndro1d'data.table'中的子集設置稍有不同'res $ V1'是一個向量。你可以使用'res [[1]]來獲得第一列作爲矢量 – akrun

使用dplyr：

library(dplyr) 
group_by(images, V1) %>% # group by the V1 column 
    filter(n() >= 5) %>% # keep only groups with 5 or more rows 
    slice(1:5)   # keep only the first 5 rows in each group

您可以將結果指定給正常的對象。例如my_desired_result = group_by(images, ...

來源

2016-06-12 21:41:19 Gregor

我是否必須將結果分配給某個變量？我試過了，沒有結果。數據框不會改變，也不能將某些東西存儲到變量中。但它看起來非常像我正在尋找的東西。 – 4ndro1d

Need obj_name < - –

我試過沒有成功 – 4ndro1d

R - 從數據幀中剪切數據以進行平衡

回答

相關問題