2010-10-05 83 views
1

我有一個文件,其中包含6列的數據結構並排存儲。這意味着我有n次6列存儲在一個平面文件。
基本上,我想以一種形式重新排列數據,我只有一個data.frame包含6列,但將文件中的所有數據附加到前6列的末尾。如何使用R重新安排數據幀中的數據(組合類似的重複列)

Row 1V1 1V2 1V3 1V4 1V5 1V6 2V1 2V2 2V3 2V4 2V5 2V6 3V1... 
1 
2 

結果應該看起來像移動數據從2V1-2V6到1V1-1V6

Row V1 V2 V3 V4 V5 V6 
1-1 
1-2 
2-1 
2-2 

結束時,我查閱了一些代碼片段,並可以在數據加載到所有的數據幀矢量。然後我嘗試創建n個總是包含重複數據結構的數據框。然後我嘗試將單個數據框合併到最後一個,但它不起作用。

df<-read.table("test.txt",header = FALSE, sep = ";", skip = 2) 
columnmax=as.integer(ncol(df)/6) 
dfnew <- vector(mode="list",length=columnmax) 
for (i in 1:columnmax) { 
start<-((i-1)*6+1) 
end<-(i*6) 
dfnew[[i]]<-df[,start:end] 
} 
y <- do.call(rbind, dfnew) 

結果:

Error in match.names(clabs, names(xi)) : 
    names do not match previous names 

我用列表模式,因爲我沒有得到它的工作,以數據幀,否則分開。 但現在看來,它使得一個問題成爲可能,因爲「列名」不完全相同。 我還沒有想法如何更改列名稱,因爲它不是R終端中的矩陣,而是一個列表。 我確定必須有一種更簡單的方法來做我想做的事情,但我剛剛開始使用R,並且不熟悉數據類型的許多不同概念。

編輯: DATA

structure(list(V1 = NA, V2 = NA, V3 = NA, V4 = NA, V5 = NA, V6 = NA, 
    V7 = NA, V8 = NA, V9 = NA, V10 = NA, V11 = NA, V12 = NA, 
    V13 = structure(1L, .Label = "1,20101E+27", class = "factor"), 
    V14 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), 
    V15 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), 
    V16 = 1L, V17 = NA, V18 = NA, V19 = structure(1L, .Label = "1,20101E+27", class = "factor"), 
    V20 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), 
    V21 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), 
    V22 = 1L, V23 = NA, V24 = NA, V25 = structure(1L, .Label = "1,20101E+27", class = "factor"), 
    V26 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), 
    V27 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), 
    V28 = 1L, V29 = NA, V30 = NA, V31 = structure(1L, .Label = "1,20101E+27", class = "factor"), 
    V32 = structure(1L, .Label = "05.07.2010 14:50", class = "factor"), 
    V33 = structure(1L, .Label = "ADMINISTRATOR", class = "factor"), 
    V34 = 1L, V35 = NA, V36 = NA, V37 = NA, V38 = NA, V39 = NA, 
    V40 = NA, V41 = NA, V42 = NA, V43 = NA, V44 = NA, V45 = NA, 
    V46 = NA, V47 = NA, V48 = NA, V49 = NA, V50 = NA, V51 = NA, 
    V52 = NA, V53 = NA, V54 = NA, V55 = NA, V56 = NA), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19", "V20", 
"V21", "V22", "V23", "V24", "V25", "V26", "V27", "V28", "V29", 
"V30", "V31", "V32", "V33", "V34", "V35", "V36", "V37", "V38", 
"V39", "V40", "V41", "V42", "V43", "V44", "V45", "V46", "V47", 
"V48", "V49", "V50", "V51", "V52", "V53", "V54", "V55", "V56" 
), row.names = 1L, class = "data.frame") 
+1

SebM,你可以使用可加載數據更新你的文章嗎?嘗試發佈此結果:dput(head(df,5)) – 2010-10-05 15:42:41

+0

我想它不再必要了,但我試着明天做。只是爲了讓帖子完整,並讓我適應這裏的論壇系統。謝謝你的幫助。 – Sebastian 2010-10-05 16:33:07

+0

我從字面上看你昨天有確切的問題。 'dput()'讓你得到更快的答案,或者爲你的解算器生成示例數據。 :) – 2010-10-05 16:36:25

回答

4

嘗試:

x1 <- seq(from=1, to=ncol(df)-1, by=6) 
x2 <- seq(from=6, to=ncol(df), by=6) 

dfnew <- data.frame("V1"=0,"V2"=0,"V3"=0,"V4"=0,"V5"=0,"V6"=0) 

for(x in 1:(ncol(df)/6)) { 
tmpdf <- df[x1[x]:x2[x]] 
colnames(tmpdf) <- colnames(dfnew) 
dfnew <- rbind(dfnew,tmpdf) 
} 
+0

是的,這很好。那麼我已經嘗試了整整一天。偉大的幫助謝謝。 – Sebastian 2010-10-05 16:32:04

2

這裏有一個簡單的循環來爲你工作:

首先,虛擬數據

> set.seed(123) 
> DF <- data.frame(matrix(rnorm(5*6*6), ncol = 36)) 
> names(DF) <- paste(rep(1:6, each = 6), "V", rep(1:6, times = 6), sep = "") 
> names(DF) 
[1] "1V1" "1V2" "1V3" "1V4" "1V5" "1V6" "2V1" "2V2" "2V3" "2V4" "2V5" "2V6" 
[13] "3V1" "3V2" "3V3" "3V4" "3V5" "3V6" "4V1" "4V2" "4V3" "4V4" "4V5" "4V6" 
[25] "5V1" "5V2" "5V3" "5V4" "5V5" "5V6" "6V1" "6V2" "6V3" "6V4" "6V5" "6V6" 

現在設置循環,以便在每個階段取數據幀的i,i + 6,i +(2 * 6),... cols並將它們疊加到新數據幀的向量中DF2

> n <- 6 ## number of groups of 6 
> DF2 <- data.frame(matrix(NA, ncol = 6, nrow = 6 * nrow(DF))) 
> for(i in seq_len(n)) { 
+  DF2[[i]] <- unlist(DF[, seq(i, n*6, by = 6)]) 
+ } 
> names(DF2) <- paste("V", seq_len(n), sep = "") 
> head(DF2) 
      V1   V2   V3   V4   V5   V6 
1 -0.56047565 1.7150650 1.2240818 1.7869131 -1.0678237 -1.6866933 
2 -0.23017749 0.4609162 0.3598138 0.4978505 -0.2179749 0.8377870 
3 1.55870831 -1.2650612 0.4007715 -1.9666172 -1.0260044 0.1533731 
4 0.07050839 -0.6868529 0.1106827 0.7013559 -0.7288912 -1.1381369 
5 0.12928774 -0.4456620 -0.5558411 -0.4727914 -0.6250393 1.2538149 
6 0.42646422 0.6886403 -0.6947070 -1.1231086 0.2533185 1.5164706 

這假設只有6個變量,但是n控制着你有6個組的數量。

+0

不確定我是否正確,但我認爲它必須是ncol(DF)而不是nrow(DF),在 DF2 < - data.frame(matrix(NA,ncol = 6,nrow = 6 * ncol(DF))) 。使用提供的示例matirx它可以工作,但是由於也有非數字內容,所以我的數據沒有。 – Sebastian 2010-10-06 13:11:03

+0

@SebM:不,它需要是'6 * nrow(DF)'。在你的例子中,如果我正確地理解了它,那麼如果你在原始結構中有5行,並且'n == 6'是數據集(或變量數)的數目,那麼你有6 * 5行在你想要的數據結構中。我懷疑它是因爲非數字內容而失敗的,但是由於您在文章中沒有提到這一點,也沒有給我們示例數據,所以第二次猜測您的需求有點困難。 – 2010-10-06 13:25:48

+0

@SebM:如果不需要非數字的東西(即不是V1,V2等的一部分),那麼爲什麼不排除它呢? 'oldDF < - DF'後跟DF < - DF [,-cols]',其中'cols'包含非數字列的索引。然後運行循環。 – 2010-10-06 13:26:35