僅從數據幀中選擇包含值大於5的列

對R來說很新穎，所以這是一個難題：我有一個從csv導入的數據框。第一列包含行名（基因），第二列包含組分配（如果基因在組1或組4等）。接下來的100列包含基因通路測量（範圍-20至+20）。我想，只選擇在第1組中的行，然後只對組顯示的列含有至少1值1點的行大於10僅從數據幀中選擇包含值大於5的列

示例數據：

NAME Group path1 path2 path3 path4 path5 
gene1 8 -19.1 -26.6 3.0 0.8 -5.1 
gene2 1 -2.8 22.8 -1.2 20.8 -9.6 
gene3 4 -5.4 -4.0 2.7 5.8 -6.8 
gene4 1 -9.9 -24.6 7.3 -2.1 -18.9 
gene5 2 -4.7 -9.4 -3.1 0.6 -10.1 
gene6 1 14.0 -5.8 -1.6 -2.5 -18.7 
gene7 5 -6.4 -3.8 2.0 -2.1 -8.6 
gene8 1 -9.9 -4.8 5.2 2.0 -17.5

我曾嘗試這一方法但麻煩它適合我的數據 Subset columns in R with specific values

任何幫助將不勝感激！

來源

2016-03-06 user27206

通過tidyr和dplyr重整您的數據以簡化您的操作。它會把你的colname放在一列中。然後過濾組和值。

library(tidyr) 
library(dplyr) 
DT %>% 
    gather("Path", "value", -NAME, -Group) %>% 
    filter(Group == 1, value > 10) 
#> NAME Group Path value 
#> 1 gene6  1 path1 14.0 
#> 2 gene2  1 path2 22.8 
#> 3 gene2  1 path4 20.8

如果你想從選定列的所有行Group == 1和所有值，只要保持colnames和子集的表

library(tidyr) 
library(dplyr) 
colname <- DT %>% 
    gather("Path", "value", -NAME, -Group) %>% 
    filter(Group == 1, value > 10) %>% 
    select(Path) 

DT[DT$Group == 1, c("NAME", "Group", colname$Path)] 
#> NAME Group path1 path2 path4 
#> 2 gene2  1 -2.8 22.8 20.8 
#> 4 gene4  1 -9.9 -24.6 -2.1 
#> 6 gene6  1 14.0 -5.8 -2.5 
#> 8 gene8  1 -9.9 -4.8 2.0

來源

2016-03-06 20:59:11 cderv

這工作 - 我使用了所選列中的所有值（解決方案的第二部分）。謝謝！ – user27206

剛內基礎R入住，並利用您鏈接到的問題我們可以做

## Data 
df <- data.frame(NAME = c("gene1","gene2","gene3","gene4"), 
          Group = c(8,1,4,1), 
          path1 = c(-19.1, -2.8, -5.4, -9.9), 
          path2 = c(-26.6, 22.8, -4, -24.6)) 

drops <- c("NAME", "Group") 
keeps <- names(df)[!names(df) %in% drops] 

## Subset the data by the groups of interest first 
df_1 <- df[df$Group == 1,] 

## This next step is similar to your linked question, 
## it just uses `any` in place of `all`, and only on a subset of the columns 

cbind(df_1[, drops], do.call(cbind, lapply(df_1[, keeps], function(x){ if(any(x >= 5)) return(x) }))) 

## Or alternatively, 
df_1[, c(drops, do.call(c, sapply(keeps, function(x) if(any(df[, x] >= 5)) return(x)))) ]

這給

NAME Group path2 
2 gene2  1 22.8 
4 gene4  1 -24.6

來源

2016-03-06 21:15:52 SymbolixAU

我使用Titolondon第二個解決方案，因爲它保留了列中的所有信息。感謝您付出努力回覆。 – user27206

@ user27206我不確定我是否理解你的評論 - 我的解決方案還保留了該專欄中的所有信息？（請注意，在我的示例中，我使用了一小部分數據） – SymbolixAU

僅從數據幀中選擇包含值大於5的列

回答

相關問題