2016-01-21 61 views
1

我有一個data.frame包含5列,每列包含整個的一部分。這裏是什麼樣子:將矩陣值轉換爲具有給定值的單元格的數量

Sample Type_A Type_B Type_C Type_D Type_E Sum 
00001  54  13   24  3   6  100 
00002  5   2   15  54  24  100 
00003  10  10   23  37  20  100 

我想創建一個100列matrix和填充相稱其值的單元格在我的data.frame。行00001看起來在第一個50個單元A,然後13個細胞與他們B,然後24個細胞與他們C

所需的矩陣將是這個樣子:

00001 A A A A A A A A A A A A A A ..... 
00002 A A A A A B B C C C C C C C ..... 
00003 A A A A A A A A A A B B B B ..... 
+0

你到目前爲止嘗試過哪些代碼? –

+0

您的第一行不總計爲100。 –

回答

3

這裏是data.table(假設在「類型」列中的值總和爲100的所有行)的另一種選擇,它不會工作。

library(data.table) 
nm1 <- sub(".*_", "", grep("_", names(df1), value=TRUE)) 
setDT(df1)[, transpose(list(rep(nm1, unlist(.SD)))), 
    by = Sample ,.SDcols = Type_A:Type_E] 
# Sample V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 
#1: 00001 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A 
#2: 00002 A A A A A B B C C C C C C C C C C C C C C C D D D D D D D D D D D D D D D D D D D D D D D D D D 
#3: 00003 A A A A A A A A A A B B B B B B B B B B C C C C C C C C C C C C C C C C C C C C C C C D D D D D 
# V49 V50 V51 V52 V53 V54 V55 V56 V57 V58 V59 V60 V61 V62 V63 V64 V65 V66 V67 V68 V69 V70 V71 V72 V73 V74 V75 V76 V77 V78 V79 V80 V81 V82 V83 V84 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94 V95 
#1: A A A A A A B B B B B B B B B B B B B C C C C C C C C C C C C C C C C C C C C C C C C D D D E 
#2: D D D D D D D D D D D D D D D D D D D D D D D D D D D D E E E E E E E E E E E E E E E E E E E 
#3: D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D E E E E E E E E E E E E E E E 
# V96 V97 V98 V99 V100 
#1: E E E E E 
#2: E E E E E 
#3: E E E E E 
2

需要注意的是你的第一個樣品不加起來100,但96爲了示範起見,我將使用54

嘗試rep

rep(c("A","B","C","D","E"),c(54,13,24,3,6)) 

# "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" 
# "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "B" "B" "B" 
# "B" "B" "B" "B" "B" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "D" "D" 
# "D" "E" "E" "E" "E" "E" "E" 

在數據幀的情況下,我會做這樣的事情(但大概可以用更少的代碼完成):

# Some preparation 
df2 <- df[,2:(ncol(df)-1)] # selecting just the types 
names(df2) <- gsub("Type_", "", names(df2)) # Removing "Type_" from the variable names 

# Apply rep to all rows 
lis <- apply(df2,1,function(x) rep(names(df2),x)) 
t(as.matrix(lis)) 
1

我有一個快速的解決方案哈克,如果可以的話。首先,我製作一些與您提供的半數據相匹配的假數據。

library(plyr) 
dat <- matrix(c(50,14,24,12, 50,50,0,0), ncol=4, byrow=TRUE) 
colnames(dat) <- paste('Type_', LETTERS[1:4], sep='') 

然後我用一個很笨重strsplit命令來獲取信了colnames和使用apply語句rep基於單元格的值的字母。請注意,如果你行的總和不等於100

adply(data,1,function(x){ 
    nms <- unlist(lapply(strsplit(colnames(dat), '_'), function(x)x[2])) 
    rep(nms, x)})[,-1] 
1

這裏是一個dplyrtidyr的解決方案。可能有更簡單的方法來處理這個問題

### Vectorize "rep" 
vec_rep <- function(x,y) { 
    unlist(lapply(1:length(x), function(z) { paste(rep(x[z], y[z]), collapse = '') })) 
} 

df2 <- 
    df %>% 
    select(-Sum)         %>% # Col not needed 
    gather(Type, TypeVal, -Sample)     %>% # Reshape data to long format 
    mutate(tstr = vec_rep(gsub('^[^_]+_','', Type), TypeVal)) %>% # create strings of desired lengths 
    arrange(Sample, Type)       %>% # Sort 
    group_by(Sample)        %>% # 
    summarise(NewVal = paste(tstr, collapse=''))  # Create desired string based on grouping 

df2是一個可以轉換成矩陣的數據幀。

相關問題