2016-08-03 28 views
1

我是新來的。我剛開始學習R.將行條目轉換爲R中的列

我有這樣的疑問:

假設我有一個數據幀:

name = c("John", "John","John","John","Mark","Mark","Mark","Mark","Dave", "Dave","Dave","Dave") 
color = c("red", "blue", "green", "yellow","red", "blue", "green", "yellow","red", "blue", "green", "yellow") 
value = c(1,2,1,3,5,5,3,2,4,6,7,8) 
df = data.frame(name, color, value) 
#View(df) 
df 
#  name color value 
# 1 John red  1 
# 2 John blue  2 
# 3 John green  1 
# 4 John yellow  3 
# 5 Mark red  5 
# 6 Mark blue  5 
# 7 Mark green  3 
# 8 Mark yellow  2 
# 9 Dave red  4 
# 10 Dave blue  6 
# 11 Dave green  7 
# 12 Dave yellow  8 

,我希望它看起來像這樣:

# names red blue green yellow 
#1 John 1 2  1  3 
#2 Mark 5 5  3  2 
#3 Dave 4 6  7  8 

那是,第一列(名稱)中的條目將變得唯一,第二列(顏色)中的級別將成爲新列,並且這些新列中的條目將來自對應在原始數據框中的第三列(值)中的行。

我可以使用下面的做到這一點:

library(dplyr) 
    df = df %>% 
    group_by(name) %>% 
    mutate(red = ifelse(color == "red", value, 0.0), 
     blue = ifelse(color == "blue", value, 0.0), 
     green = ifelse(color == "green", value, 0.0), 
     yellow = ifelse(color == "yellow", value, 0.0)) %>% 
    group_by(name) %>% 
    summarise_each(funs(sum), red, blue, green, yellow) 
df 
    name red blue green yellow 
1 Dave  4  6  7  8 
2 John  1  2  1  3 
3 Mark  5  5  3  2 

但是,如果有很多的顏色欄的水平,這將不是很理想。我將如何繼續這樣做?

謝謝!

回答

3

由於OP使用dplyr家庭套餐的,一個不錯的選擇與tidyr

library(tidyr) 
spread(df, color, value) 
# name blue green red yellow 
#1 Dave 6  7 4  8 
#2 John 2  1 1  3 
#3 Mark 5  3 5  2 

如果我們需要使用%>%

library(dplyr) 
df %>% 
    spread(color, value) 

爲了保持秩序,我們可以將'color'轉換爲factor類,使用levels類指定爲'color'的unique值,然後執行th Ëspread

df %>% 
    mutate(color = factor(color, levels = unique(color))) %>% 
    spread(color, value) 
# name red blue green yellow 
#1 Dave 4 6  7  8 
#2 John 1 2  1  3 
#3 Mark 5 5  3  2 

或者我們可以使用data.table以更快dcast。轉換爲data.table並使用data.tabledcast具有優勢。它比reshape2dcast快得多。

library(data.table) 
dcast(setDT(df), name~color, value.var="value") 
# name blue green red yellow 
#1: Dave 6  7 4  8 
#2: John 2  1 1  3 
#3: Mark 5  3 5  2 

注:在這兩種解決方案,我們得到的列名在預期的輸出,並且沒有連接到它(這BTW是可以改變的任何醜陋的前綴或後綴,但它是另一行代碼)


如果我們需要一個base R,一種選擇是tapply

with(df, tapply(value, list(name, color), FUN = I)) 
#  blue green red yellow 
#Dave 6  7 4  8 
#John 2  1 1  3 
#Mark 5  3 5  2 
+1

這是快。謝謝! – chowching

3

所以,你要跨標籤呢?

> xtabs(value~name+color, df) 
     color 
name blue green red yellow 
    Dave 6  7 4  8 
    John 2  1 1  3 
    Mark 5  3 5  2 
3

您可以使用dcastreshape2

library(reshape2) 
dcast(df, name~color) 


# name blue green red yellow 
#1 Dave 6  7 4  8 
#2 John 2  1 1  3 
#3 Mark 5  3 5  2 

要不然你可以從reshapebase R

reshape(df, idvar="name", timevar="color", direction="wide") 


# name value.red value.blue value.green value.yellow 
#1 John   1   2   1   3 
#5 Mark   5   5   3   2 
#9 Dave   4   6   7   8