使用重塑（）函數中的R - 從廣角到長

我想從像重新安排在R上的數據：使用重塑（）函數中的R - 從廣角到長

Patient ID,Episode Number,Admission Date (A),Admission Date (H),Admission Time (A),Admission Time (H) 
1,5,20/08/2011,21/08/2011,1200,1300 
2,6,21/08/2011,22/08/2011,1300,1400 
3,7,22/08/2011,23/08/2011,1400,1500 
4,8,23/08/2011,24/08/2011,1500,1600

喜歡的東西：

Record Type,Patient ID,Episode Number,Admission Date,Admission Time 
H,1,5,20/08/2011,1200 
A,1,5,21/08/2011,1300 
H,2,6,21/08/2011,1300 
A,2,6,22/08/2011,1400 
H,3,7,22/08/2011,1400 
A,3,7,23/08/2011,1500 
H,4,8,23/08/2011,1500 
A,4,8,24/08/2011,1600

（我使用CSV格式，因此使用它們作爲測試數據更容易）。

我試過重塑（）函數和它種工作：

> reshape(foo, direction = "long", idvar = 1, varying = 3:dim(foo)[2], 
> sep = "..", timevar = "dataset") 
    Patient.ID Episode.Number dataset Admission.Date Admission.Time 
1.A.   1    5  A.  20/08/2011   1200 
2.A.   2    6  A.  21/08/2011   1300 
3.A.   3    7  A.  22/08/2011   1400 
4.A.   4    8  A.  23/08/2011   1500 
1.H.   1    5  H.  21/08/2011   1300 
2.H.   2    6  H.  22/08/2011   1400 
3.H.   3    7  H.  23/08/2011   1500 
4.H.   4    8  H.  24/08/2011   1600

但它不是在正確的格式，我想（我要爲每一個「病人ID」，第一行是「H」第二行是「A」）。

此外，在擴展該讀取數據（其中有250+列）失敗：

> reshape(realdata, direction = "long", idvar = 1, varying = 
> 6:dim(foo)[2], sep = "..", timevar = "dataset") 
Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, : 
    'varying' arguments must be the same length

我覺得一方面是因爲colnames樣子：

> colnames(foo) 
    [1] "Unique.Key"          
    [2] "Campus.Code"         
    [3] "UR"            
    [4] "Terminal.digit"         
    [5] "Admission.date..A."      
    [6] "Admission.date..H."      
    [7] "Admission.time..A."      
    [8] "Admission.time..H."  
    . 
    . 
    . 
[31] "Medicare.Number"        
[32] "Payor"           
[33] "Doctor.specialty"        
[34] "Clinic"  
    . 
    . 
    . 
[202] "Admission.Source..A."      
[203] "Admission.Source..H."

即有是具有後綴的列之間的「常用列」（無後綴）（希望這有意義）。

來源

2011-09-01 Kevin Wang

我不明白爲什麼'reshape'命令不正確。它看起來像它以您想要的格式提供數據，儘管行的順序不同。（你可以用'order'函數輕鬆地改變行順序。） –

您可能可以通過使用融合和投射或重塑來獲得您想要的內容，但是您正在尋找相當具體的東西，因此直接進行重塑可能會更簡單。您可以將原始數據分爲兩個單獨的數據框（一個用於A，一個用於H），然後將它們粘合在一起。

下面的代碼適用於您的示例數據，但我也嘗試將它寫得足夠靈活，以便希望能夠在您的大型數據集上工作，只要列的名稱與..A一致。和..H。後綴。

#grab the common columns and the "A" columns 
#(by using grepl to find any column that doesn't end in ".H.") 
foo.a <- foo[,!grepl(x=colnames(foo),pattern = "\\.H\\.$")] 

#strip the "..A." from the end of the ".A." column names 
colnames(foo.a) <- sub(x=colnames(foo.a), 
        pattern="(.*)\\.\\.A\\.$", 
        rep = "\\1") 
foo.a$Record.Type <- "A" 

#grab the common columns and the "H" columns 
#(by using grepl to find any column that doesn't end in ".A.") 
foo.h <- foo[,!grepl(x=colnames(foo),pattern = "\\.A\\.$")] 

#strip the "..H." from the end of the "..H." column names 
colnames(foo.h) <- sub(x=colnames(foo.h), 
        pattern="(.*)\\.\\.H\\.$", 
        rep = "\\1") 
foo.h$Record.Type <- "H" 

#stick them back together 
new.foo <- rbind(foo.a,foo.h) 

#order by Patient.ID 
new.foo <- new.foo[with(new.foo,order(Patient.ID)),] 

#re-order the columns as you like 
new.foo <- new.foo[,c(1,2,5,3,4)]

這給了我：

> new.foo 
    Patient.ID Episode.Number Record.Type Admission.Date Admission.Time 
1   1    5   A  20/08/2011   1200 
5   1    5   H  21/08/2011   1300 
2   2    6   A  21/08/2011   1300 
6   2    6   H  22/08/2011   1400 
3   3    7   A  22/08/2011   1400 
7   3    7   H  23/08/2011   1500 
4   4    8   A  23/08/2011   1500 
8   4    8   H  24/08/2011   1600

來源

2011-11-11 16:18:52 mac

的建議使用melt和cast從「重塑」（現dcast和家庭）（現爲「reshape2」）包不會讓你到你正在尋找你的數據。特別是，如果您的最終目標是您描述的「半長」格式，則爲you'll need to do some additional processing。

有你在你的問題提出了兩個問題：

首先是結果的排序。作爲@RichieCotton points out in his comment和@mac in his answer，撥打order()就足以解決該問題。

二是錯誤：

Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, : 
    'varying' arguments must be the same length

這是因爲，你猜到了，有你的varying = 6:dim(foo)[2]選擇列表中不變化的列。

解決這個問題的一個簡單方法是使用grep來確定哪些列是變化的，並使用它來指定您的列而不是像您一樣使用（不正確）catchall。這裏有一個樣例：

set.seed(1) 
foo <- data.frame(Unique.Key = 1:4, Campus.Code = LETTERS[1:4], 
        Admission.Date..A = 11:14, Admission.Date..H = 21:24, 
        Medicare.Number = letters[1:4], Payor = letters[1:4], 
        Admission.Source..A = rnorm(4), 
        Admission.Source..H = rnorm(4)) 
foo 
# Unique.Key Campus.Code Admission.Date..A Admission.Date..H Medicare.Number 
# 1   1   A    11    21    a 
# 2   2   B    12    22    b 
# 3   3   C    13    23    c 
# 4   4   D    14    24    d 
# Payor Admission.Source..A Admission.Source..H 
# 1  a   -0.6264538   0.3295078 
# 2  b   0.1836433   -0.8204684 
# 3  c   -0.8356286   0.4874291 
# 4  d   1.5952808   0.7383247

找出哪些列不同，並以此作爲你的varying參數：

varyingCols <- grep("\\.\\.A$|\\.\\.H$", names(foo)) 

out <- reshape(foo, direction = "long", idvar = "Unique.Key", 
       varying = varyingCols, sep = "..") 
out[order(out$Unique.Key, rev(out$time)), ] 
#  Unique.Key Campus.Code Medicare.Number Payor time Admission.Date Admission.Source 
# 1.H   1   A    a  a H    21  0.3295078 
# 1.A   1   A    a  a A    11  -0.6264538 
# 2.H   2   B    b  b H    22  -0.8204684 
# 2.A   2   B    b  b A    12  0.1836433 
# 3.H   3   C    c  c H    23  0.4874291 
# 3.A   3   C    c  c A    13  -0.8356286 
# 4.H   4   D    d  d H    24  0.7383247 
# 4.A   4   D    d  d A    14  1.5952808

如果您的數據是小（不是很多列），你可以手動統計varying列的位置並指定向量。正如您已經注意到的，任何未在idvar或varying中指定的列都會得到適當的回收。

out <- reshape(foo, direction = "long", idvar = "Unique.Key", 
       varying = c(3, 4, 7, 8), sep = "..")

來源

2013-07-27 09:41:56 A5C1D2H2I1M1N2O1R2T1

使用重塑（）函數中的R - 從廣角到長

回答

相關問題