2017-10-18 103 views
0

列提取唯一行我有一個數據幀:[R數據框 - 從

source= c("A", "A", "B") 
target = c("B", "C", "C") 
source_A = c(5, 5, 6) 
target_A = c(6, 7, 7) 
source_B = c(10, 10, 11) 
target_B = c(11, 12, 12) 
c = c(0.5, 0.6, 0.7) 
df = data.frame(source, target, source_A, target_A, source_B, target_B, c) 

> df 
    source target source_A target_A source_B target_B c 
1  A  B  5  6  10  11 0.5 
2  A  C  5  7  10  12 0.6 
3  B  C  6  7  11  12 0.7 

如何減少這個數據框爲獨特的源和目標值僅返回值和返回(忽略列C)。

的價值觀[A B C]

id A B 
1 A 5 10 
2 B 6 11 
3 C 7 12 

目前,我做這樣的事:

df1 <- df[,c("source","source_A", "source_B")] 
df2 <- df[,c("target","target_A", "target_B")] 

names(df1)[names(df1) == 'source'] <- 'id' 
names(df1)[names(df1) == 'source_A'] <- 'A' 
names(df1)[names(df1) == 'source_B'] <- 'B' 
names(df2)[names(df2) == 'target'] <- 'id' 
names(df2)[names(df2) == 'target_A'] <- 'A' 
names(df2)[names(df2) == 'target_B'] <- 'B' 

df3 <- rbind(df1,df2) 
df3[!duplicated(df3$id),] 

    id A B 
1 A 5 10 
3 B 6 11 
5 C 7 12 

在現實中,我有幾列的,所以這是不可行的長遠。

我怎樣才能更簡潔地完成這項工作(理想情況下,對於更多的專欄可以概括)?

+0

的'source'和'target'值相同'id'總是相同? – LAP

+0

@LAP是的(或者我已經搞砸了......)其他列的A值總是特定於A(即使它們對於每列都不相同)。 – Chuck

回答

0
library(dplyr) 
library(magrittr) 

df1 <- subset(df, select = ls(pattern = "source")) 
df2 <- subset(df, select = ls(pattern = "target")) 

names(df1) <- names(df2) 
df <- bind_rows(df1, df2) 
df %<>% group_by(target, target_A, target_B) %>% slice(1) 

這應該這樣做,但我不太清楚你想如何推廣它。 我不認爲這是世界上最優雅的解決方案,但它符合這個目的。希望您打算使用的列可以作爲列名稱字符串模式的目標!

0

下面是一個更通用的方法dplyr函數。你基本上需要的一切聚集成一個長格式,在那裏你可以相應地重命名變量,然後蔓延它們放回id, A, B

library(dplyr) 
library(tidyr) 

df %>% 
    select(-c) %>% 
    mutate(index = row_number()) %>% 
    gather(key , value, -index) %>% 
    separate(key, c("type", "name"), fill = "right") %>% 
    mutate(name = ifelse(is.na(name), "id", name)) %>% 
    spread(key = name, value = value) %>% 
    select(id, matches("[A-Z]", ignore.case = FALSE)) %>% 
    distinct