2014-03-12 70 views
0

我有一個data.frame與不同長度的列,我試圖根據每列中的最後一個值對齊。前5行包含我不能丟棄的特定識別信息。R代碼對齊列長度

我一直在使用Excel中的一個代碼,它正是我想要的,但希望我可以使用類似的代碼做同樣的過程中R.

樣品data.frame(組實際數據遠大):

Series1 <- c("Lync", "23017323003", "2011", "sp1", "45.6", "2.4", "3.1", "1.9", "6.6", "1.4")
Series2 <- c("Lync", "23017323003", "2010", "sp2", "52.8", "3.8", "2.5", "4.3", "NA", "NA")
Series3 <- c("Faye", "23011195006", "2011", "sp1", "63.1", "1.3", "5.2", "0.7", "3.1", "NA")
df <- data.frame(Series1, Series2, Series3)

預期輸出data.frame:

Row_Names <- c("Town", "SiteID", "EndYear", "Subplot", "PathLength", "2007", "2008","2009", "2010", "2011")
Series1fix <- c("Lync", "23017323003", "2011", "sp1", "45.6", "2.4", "3.1", "1.9", "6.6", "1.4")
Series2fix <- c("Lync", "23017323003", "2010", "sp2", "52.8", "NA", "3.8", "2.5", "4.3", "NA")
Series3fix <- c("Faye", "23011195006", "2011", "sp1", "63.1", "NA", "1.3", "5.2", "0.7", "3.1")
FixedDF <- data.frame(Row_Names, Series1fix, Series2fix, Series3fix)

Excel的代碼,有人幫我如下:

Sub shift_to_last_row() 

Dim LastRowOnSheet As Long 
Dim LastRowInColumn As Long 
Dim LastColumn As Long 
Dim col As Long 
Dim arr As Variant 

With Cells 
LastRowOnSheet = .Find("*", .Cells(1, 1), xlFormulas, xlPart, xlByRows, xlPrevious, False, False).Row 
LastColumn = .Find("*", .Cells(1, 1), xlFormulas, xlPart, xlByColumns, xlPrevious, False, False).Column 
End With 

For col = 1 To LastColumn 
    LastRowInColumn = Cells(Rows.Count, col).End(xlUp).Row 
    If LastRowInColumn <> LastRowOnSheet Then 
    arr = Range(Cells(6, col), Cells(LastRowInColumn, col)) 
    Range(Cells(6, col), Cells(LastRowOnSheet, col)).ClearContents 
    Range(Cells(6 + LastRowOnSheet - LastRowInColumn, col), Cells(LastRowOnSheet, col)) = arr 
    End If 
Next col 

關於如何在R中做到這一點的任何想法將是偉大的。我有大約150個文件來做到這一點,每個文件包含大約50列和150行。

編輯 我正在使用的真實data.frame的示例子集。

structure(c("23017323003sp4", "2011", "40", "2/18/2014", "13:40:54", "67.9709", "2.516", "2.510", "1.095", "1.721", "0.574", "0.730", "0.924", "0.585", "1.565", "1.208", "1.104", "0.842", "0.671", "1.399", "1.136", "2.005", "0.946", "1.114", "1.191", "1.192", "2.217", "2.528", "3.706", "2.899", "2.646", "1.698", "1.815", "3.647", "2.141", "2.080", "1.022", "1.610", "2.25", "2.844", "2.651", "1.554", "1.538", "0.958", "1.290", "1.253", "23017323003sp4", "2011", "40", "2/18/2014", "13:40:54", "51.4189", "0.894", "0.977", "0.308", "0.670", "0.357", "0.151", "0.208", "0.256", "0.418", "0.591", "1.119", "0.758", "1.616", "1.698", "1.003", "1.774", "1.348", "1.088", "0.979", "0.992", "1.408", "1.312", "1.828", "1.429", "1.243", "1.093", "2.027", "2.205", "1.637", "1.456", "1.311", "1.531", "1.97", "2.182", "2.217", "2.128", "2.402", "1.471", "1.561", "1.449", "23017323003sp4", "2011", "19", "2/18/2014", "13:40:54", "36.6195", "1.631", "2.290", "1.652", "1.348", "1.335", "1.936", "3.442", "2.258", "1.883", "1.463", "1.282", "1.557", "2.282", "2.737", "2.736", "2.388", "1.346", "1.388", "1.240", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), .Dim = c(46L, 3L), .Dimnames = list(c("V2", "V3", "V9", "V13", "V14", "V112", "V113", "V114", "V115", "V116", "V117", "V118", "V119", "V120", "V121", "V122", "V123", "V124", "V125", "V126", "V127", "V128", "V129", "V130", "V131", "V132", "V133", "V134", "V135", "V136", "V137", "V138", "V139", "V140", "V141", "V142", "V143", "V144", "V145", "V146", "V147", "V148", "V149", "V150", "V151", "V152"), c("LY3A003B", "LY3A004A", "LY3A004B" )))

使用jlhoward建議的代碼,我已經試過以下(以上data.frame樣的題目是 「Lync3rwlTrans」:

series <- as.vector(Lync3rwlTrans[,3])
result <- do.call(cbind,lapply(series,function(s){
+ data <- s[7:46]
+ data <- data[data!="NA"]
+ end <- 40-(2011-as.numeric(s[2]))
+ start <- end-length(data)+1
+ ret <- rep("NA",40)
+ ret[start:end] <- data
+ return(c(s[1:6],ret))
+ }))
rownames(result) <- c("SiteID", "EndYear", "#Rings", "EditDate", "EditTime", "PathLength", 1972:2011)
result <- data.frame(result, stringsAsFactors=F)
result

不過,我不斷收到以下錯誤:錯誤 在開始:結束:NA/NaN的說法

+0

對不起,但我不明白你的*路線*的*規則*。什麼時候「NA」(或價值)向上移動? – sgibb

+0

NA在那裏作爲填充物...原始數據集具有不同長度的列。將它們吐出到文本文件中的程序將它們從頂部對齊,並且我試圖從底部對齊它們。因此,基本上每列中的最後一個條目應根據第3行中輸入的年份對齊。我在固定數據集的左側添加了一列,希望能夠解釋列的移動方式(按年份)。 – KKL234

回答

1

這似乎是工作

series <- list(Series1,Series2,Series3) 
result <- do.call(cbind,lapply(series,function(s){ 
    data <- s[6:10] 
    data <- data[data!="NA"] 
    end <- 5-(2011-as.numeric(s[3])) 
    start <- end-length(data)+1 
    ret <- rep("NA",5) 
    ret[start:end] <- data 
    return(c(s[1:5],ret)) 
})) 
rownames(result) <- c("Town", "SiteID", "EndYear", "Subplot", "PathLength", "2007", "2008","2009", "2010", "2011") 
result <- data.frame(result, stringsAsFactors=F) 
result 
#      X1   X2   X3 
# Town    Lync  Lync  Faye 
# SiteID  23017323003 23017323003 23011195006 
# EndYear   2011  2010  2011 
# Subplot   sp1   sp2   sp1 
# PathLength  45.6  52.8  63.1 
# 2007    2.4   NA   NA 
# 2008    3.1   3.8   1.3 
# 2009    1.9   2.5   5.2 
# 2010    6.6   4.3   0.7 
# 2011    1.4   NA   3.1 

注意以下幾點:

  1. 我結合Series<n>到一個列表,因爲這將是導入文件的最好方法。
  2. 在你的例子中,所有東西都以char結尾,所以這就是這段代碼的工作方式。
  3. 你的NA也是char,例如"NA",而不是NA。因此像is.na(...)這樣的測試將不起作用。

編輯(應答到OP的後續問題)

因此,有兩個問題。首先,"NA"NA之間存在差異。第一種是您測試使用的字符串,例如data=="NA"。第二個是您測試使用的R值NA,例如is.na(data)我在上面的筆記中解釋了這一點。在你的「樣本數據」中,你有"NA",我在代碼中。在你的「真實數據」中,你有NA,所以代碼不起作用。這就是你得到錯誤的原因。更換

data <- data[data!="NA"] 

data <- data[!is.na(data)] 

第二,如果你的 「真實數據」 是一個字符矩陣Lync3rwlTrans,使用

df <- data.frame(Lync3rwlTrans,stringsAsFactors=F) 
result <- do.call(cbind,lapply(df, function(s)...) 

這將Lync3rwlTrans轉換爲數據幀並傳遞按列重新對齊函數。

完整的代碼是:

df <- data.frame(Lync3rwlTrans,stringsAsFactors=F) 
result <- do.call(cbind,lapply(df,function(s){ 
    data <- s[7:46] 
    data <- data[!is.na(data)] 
    end <- 40-(2011-as.numeric(s[2])) 
    start <- end-length(data)+1 
    ret <- rep(NA,40) 
    ret[start:end] <- data 
    return(c(s[1:6],ret)) 
})) 
rownames(result) <- c("SiteID", "EndYear", "#Rings", "EditDate", "EditTime", "PathLength", 1972:2011) 
result <- data.frame(result, stringsAsFactors=F) 

最後,這將是如此容易得多,如果你曾透露你的「真實數據」開頭!

+0

這是一個很好的解決方案!感謝幫助改變代碼!只是一個簡單的問題:我想批量運行代碼,並且每個data.frame具有不同數量的列和行。有沒有簡單的方法來運行「>系列< - 列表(x1,x2,x3)」所有列,而不指定列名稱?如1:n?這同樣適用於行;如果行數(本例中爲5)在data.frames之間不斷變化,是否可以多次使用「> end < - 5(2011-as.numeric(s [3])))? – KKL234

+0

我需要看一個例子。 – jlhoward

+0

我剛剛編輯了我的原始問題,附帶一小部分真實數據,以及您提供的我一直試圖使用的代碼版本。 – KKL234