比較第一列中指定的組之間的交集

假設我有一個三列的數據框：第一個指定一個特徵的數量（例如顏色），第二個指定一個組，第三個指定特徵存在於該組（1）或缺少組（0）：比較第一列中指定的組之間的交集

> d<-data.frame(feature=c("red","blue","green","yellow","red","blue","green","yellow"), group=c(rep("a",4),rep("b",4)),is_there=c(0,1,1,0,1,1,1,0)) 
> d 
    feature group is_there 
1  red  a  0 
2 blue  a  1 
3 green  a  1 
4 yellow  a  0 
5  red  b  1 
6 blue  b  1 
7 green  b  1 
8 yellow  b  0

現在我想有一個總結多少的特點是：1，只在一組，僅在b組和多少目前在兩個組中。此外，我需要提取兩個組中存在的功能的名稱。我怎樣才能做到這一點？我想像crossprod這樣的函數可能會有所幫助，但我無法弄清楚。

輸出會是這樣的：

feature 
red  1 
blue 2 
green 2 
yellow 0

或：

feature a b 
red  0 1 
blue 1 1 
green 1 1 
yellow 0 0

反正我需要在一個比較大的數據文件一個更好的概述（原擁有數百功能在約10組）。

來源

2014-08-28 aldorado

這聽起來像一個table是你想要的。首先，我們對行進行子集分組，使得列等於1並刪除第三列。然後我們在該子集上調用table。

> (tab <- table(d[d$is_there == 1, -3])) 
#   group 
# feature a b 
# blue 1 1 
# green 1 1 
# red 0 1 
# yellow 0 0

A table是矩陣狀物體。我們可以按照與我們在matrix上運營的相同方式對其進行操作。

望着組a：

> tab[,"a"]       ## vector of group "a" 
# blue green red yellow 
#  1  1  0  0 
> tab[,"a"][ tab[,"a"] > 0 ]   ## present in group "a" 
# blue green 
#  1  1 
> names(tab[,"a"][ tab[,"a"] > 0 ]) ## "feature" present in group "a" 
# [1] "blue" "green"

與同爲組b。

來源

2014-08-28 08:56:19

這並不表示黃色不存在於任何一組中？也許我的問題被不客氣地問道。 – aldorado 2014-08-28 09:04:13

@aldorado - 我編輯了 – 2014-08-28 09:09:59

@Richard Scriven +1你的桌子比我的更乾淨 – akrun 2014-08-28 09:30:04

tbl <- table(d$feature[!!d$is_there], d$group[!!d$is_there]) 
rowSums(tbl) 
#blue green red yellow 
# 2  2  1  0 

tbl 

#  a b 
#blue 1 1 
#green 1 1 
#red 0 1 
#yellow 0 0

如果你想有分組如下圖所示：

d1 <- as.data.frame(matrix(rep(c("none", "only", "both")[rowSums(tbl)+1], 
      each=2), ncol=2, byrow=TRUE, dimnames=dimnames(tbl)), 
              stringsAsFactors=FALSE) 

    d1[!tbl & rowSums(tbl)==1] <- "" 
    d1 
#  a b 
#blue both both 
#green both both 
#red   only 
#yellow none none

來源

2014-08-28 08:55:50 akrun

英雄所見略同。 – 2014-08-28 08:57:21

@Richard Scriven我不確定這是OP想要的。可能是你的解決方案 – akrun 2014-08-28 08:57:51

是雙「!!」不同於」！」？ – aldorado 2014-08-28 09:11:28

會這樣做嗎？

> tapply(d$feature[d$is_there==1],d$group[d$is_there==1], table) 

$a 
blue green red yellow 
    1  1  0  0 

$b 
blue green red yellow 
    1  1  1  0

來源

2014-08-28 08:57:37 Benoit

嘗試下面的代碼：

with(d, tapply(is_there, list(feature, group), sum)) 
#  a b 
#blue 1 1 
#green 1 1 
#red 0 1 
#yellow 0 0

來源

2014-08-28 09:06:13 rnso

採取以下數據幀：

myd <- data.frame(
    feature=c("red","blue","green","yellow","red","blue","green","yellow"), 
    group=c(rep("a",4),rep("b",4)), 
    is_there=c(0,1,1,0,1,0,1,0))

爲了得到一個係數，告訴你這裏的一切，你可以試試這個代碼：

require(reshape2) 

res <- acast(myd,feature ~ group, fun=sum, value.var="is_there") 
where <- factor(
    colSums(res) - 2*diff(t(res)), 
    levels=c(-1,0,2,3), 
    labels=c("group2","nowhere","both","group1") 
)

給予：

> res 
     a b 
blue 1 0 
green 1 1 
red 0 1 
yellow 0 0 
> where 
    blue green  red yellow 
group1 both group2 nowhere 
Levels: group2 nowhere both group1

提取那些到處存在的東西從這裏是微不足道的。

注意，任何其他的解決方案爲您提供了矩陣res都是同樣有效（tapply解決方案會更快）

來源

2014-08-28 09:12:38

'tapply'比'table'快嗎？ – 2014-08-28 09:27:24

@RichardScriven'tapply'比'acast'快，'table'給出了一個不同的結果（例如：不是你需要的那個） – 2014-08-28 11:23:54

比較第一列中指定的組之間的交集

回答

相關問題