獨特（）爲一個以上的變量

我有R中的以下數據幀：獨特（）爲一個以上的變量

> str(df) 
'data.frame': 545227 obs. of 15 variables: 
$ ykod : int 93 93 93 93 93 93 93 93 93 93 ... 
$ yad : Factor w/ 42 levels "BAKUGAN","BARBIE",..: 30 30 30 30 30 30 30 30 30 30 ... 
$ per : Factor w/ 3 levels "2 AYLIK","3 AYLIK",..: 3 3 3 3 3 3 3 3 3 3 ... 
$ donem: int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ... 
$ sayi : int 201101 201101 201101 201101 201101 201101 201101 201101 201101 201101 ... 
$ mkod : int 4 5 9 11 12 18 20 22 25 26 ... 
$ mad : Factor w/ 10464 levels " Defne Market   ",..: 405 8075 9710 10145 9297 7973 2542 3892 2759 5769 ... 
$ mtip : Factor w/ 29 levels "Abone Bürosu          ",..: 2 20 20 2 2 2 2 2 2 2 ... 
$ kanal: Factor w/ 2 levels "OB","SS": 2 2 2 2 2 2 2 2 2 2 ... 
$ bkod : int 110565 110565 110565 110565 110565 110565 110565 110565 110565 110565 ... 
$ bad : Factor w/ 212 levels "4. Levent","500 Evler",..: 167 167 167 167 167 167 167 167 167 167 ... 
$ bolge: Factor w/ 12 levels "Adana Şehiriçi",..: 7 7 7 7 7 7 7 7 7 7 ... 
$ sevk : int 2 3 3 3 2 2 2 6 2 2 ... 
$ iade : int 2 1 0 2 0 2 1 0 0 2 ... 
$ satis: int 0 2 3 1 2 0 1 6 2 0 ...

我想列出獨特（如SQL的DISTINCT），用於選擇的多個變量的值。例如，unique(yad)給了我各中42種元素的名字，但我需要提取兩列（yad和per在一起，所有的獨特組合）：

yad   per 
---   --- 
BARBIE  AYLIK 
BAKUGAN  2 AYLIK 
MICKEY MOUSE 2 AYLIK 
TINKERBELL 3 AYLIK 
...   ...

我怎樣才能做到這一點？

來源

2011-10-17 Mehper C. Palavuzlar

如何使用unique()本身？

df <- data.frame(yad = c("BARBIE", "BARBIE", "BAKUGAN", "BAKUGAN"), 
       per = c("AYLIK", "AYLIK", "2 AYLIK", "2 AYLIK"), 
       hmm = 1:4) 

df 
#  yad  per hmm 
# 1 BARBIE AYLIK 1 
# 2 BARBIE AYLIK 2 
# 3 BAKUGAN 2 AYLIK 3 
# 4 BAKUGAN 2 AYLIK 4 

unique(df[c("yad", "per")]) 
#  yad  per 
# 1 BARBIE AYLIK 
# 3 BAKUGAN 2 AYLIK

來源

2011-10-17 08:07:45

+ 1還會推薦標準化字符串（tolower，gsub out特殊字符等）。 –

如果'df'是矩陣？我應該將其轉換爲'data.frame'，還是有一個函數來做到這一點？ – sop

其實我已經找到'unique.matrix（）'完成了這項工作，感謝 – sop

有幾種方法可以獲得一組因子的所有獨特組合。

with(df, interaction(yad, per, drop=TRUE)) # gives labels 
with(df, yad:per)       # ditto 

aggregate(numeric(nrow(df)), df[c("yad", "per")], length) # gives a data frame

來源

2011-10-17 07:51:45

這是一個除了Josh的答案。

您也可以將其他變量的值，而在data.table

例過濾掉重複的行：

library(data.table) 

#create data table 
dt <- data.table(
    V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)], 
    V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)], 
    V3=c(1), 
    V4=c(2)) 

> dt 
# V1 V2 V3 V4 
# A B 1 2 
# A C 1 2 
# A D 1 2 
# A B 1 2 
# B A 1 2 
# C D 1 2 
# C D 1 2 
# E F 1 2 
# G G 1 2 
# A B 1 2 

# set the key to all columns 
setkey(dt) 

# Get Unique lines in the data table 
unique(dt[list(V1, V2), nomatch = 0]) 

# V1 V2 V3 V4 
# A B 1 2 
# A C 1 2 
# A D 1 2 
# B A 1 2 
# C D 1 2 
# E F 1 2 
# G G 1 2

警告：如果在其他變量值的不同組合，然後您的結果將是V1和V2

的

獨特組合

來源

2015-08-07 10:10:19

奇怪的是，獨特的操作工作，但結果dt的所有其他列設置爲NA。你知道爲什麼嗎？ –

謝謝你發現。此操作進行合併，因此它可以生成一些「NA」值。解決方法是用'nomatch = 0'替換'allow.cartesian = TRUE'，那麼會忽略結果中的NA值。我已經更新了答案。謝謝 –

-1

df$new_var = paste(df$yad,df$per,sep = "_") 
length(unique(df$new_var)) #for checking 
df = df[!duplicated(df$new_var),] 
nrow(df) # for checking , this should be equal to 2nd line output 
df$new_var = NULL

來源

2016-04-08 07:20:04 ashok

這不僅僅給你不同的值 - 它會覆蓋原始數據幀。不是OP要求的。 – BenBarnes

如果你不想覆蓋它，那麼它很簡單。只需將df2而不是df放入第三行的第一個.DONE – ashok

獨特（）爲一個以上的變量

回答

相關問題