2013-10-08 48 views
-2

我一直在努力創建我的數據幀中的幾個變量,看起來像這樣:在數據幀的不同行的元素創建變量

df.1 <- data.frame(unit = c('A','B','C','A','B','C','D'),location = c(1,1,1,2,2,2,2), value.X = c('5','6', '4', '3','10', '7','3'),value.Y = c('1','4','7','9','4','6','4'),team = c('A/B', 'A/B', 'C' , 'A', 'B/C', 'B/C','D'),team.B = c('A/C ', 'A/C', 'B', 'A/B/D', 'A/B/D', 'C', 'A/B/D'),supra = c('A', 'B', 'C', 'A/C/D', 'B', 'A/C/D' , 'A/C/D'),pos.supra = c(1,2,3,1,2,1,1)) 

    unit location value.X value.Y team team.B  supra pos.supra 
1 A  1  5  1 A/B A/C   A   1 
2 B  1  6  4 A/B  A/C   B   2 
3 C  1  4  7  C   B   C   3 
4 A  2  3  9  A A/B/D A/C/D   1 
5 B  2  10  4 B/C A/B/D   B   2 
6 C  2  7  6 B/C   C A/C/D   1 
7 D  2  3  4  D A/B/D A/C/D   1 

我需要創建一個總結的區別變量value.Xvalue.Y對於team.B中不在team且不在supra中的單元。和pos.supra.1,如果有問題的單位有pos.supra.1等於1,那麼它就是第一個或緊接在下面。我需要每個location中的每個unit。我知道有太多的步驟,所以這裏是一個更詳細的描述。也許你可以跳過或顛倒這些步驟的順序。沒關係。

(1)找到supra團隊,下面一個數字或(如果單位有suprapos.supra等於1

supra.I.need = c('B','A','A','B','A/C/D', 'B','B') 

(2)檢查誰在who.I.need不是team但在team.B

that.is.not.in.team.but.are.in.team.B = c('NA','NA','NA','B', 'A,D','NA','B') 

(3)最後,計算在可變所有單元的value.Yvalue.X之間的差以上,總結起來(注意我總結三角洲爲AD):

delta = c('NA','NA','NA','8','2','NA','8') 

因此,最終的數據幀應該是這樣的:

df.2 <- data.frame(unit = c('A','B','C','A','B','C','D'),location = c(1,1,1,2,2,2,2), value.X = c('5','6', '4', '3','10', '7','3'),value.Y = c('1','4','7','9','4','6','4'),team = c('A/B', 'A/B', 'C' , 'A', 'B/C', 'B/C','D'),team.B = c('A/C ', 'A/C', 'B', 'A/B/D', 'A/B/D', 'C', 'A/B/D'),supra = c('A', 'B', 'C', 'A/C/D', 'B', 'A/C/D' , 'A/C/D'),pos.supra = c(1,2,3,1,2,1,1),supra.I.need = c('B','A','A','B','A/C/D', 'B','B'),that.is.not.in.team.but.are.in.team.B = c('NA','NA','NA','B', 'A,D','NA','B'),delta = c('NA','NA','NA','8','2','NA','8')) 

    unit location value.X value.Y team team.B  supra pos.supra supra.I.need that.is.not.in.team.but.are.in.team.B delta 
1 A  1  5  1 A/B A/C   A   1   B         NA NA 
2 B  1  6  4 A/B  A/C   B   2   A         NA NA 
3 C  1  4  7  C   B   C   3   A         NA NA 
4 A  2  3  9  A A/B/D A/C/D   1   B          B  8 
5 B  2  10  4 B/C A/B/D   B   2 A/C/D         A,D  2 
6 C  2  7  6 B/C   C A/C/D   1   B         NA NA 
7 D  2  3  4  D A/B/D A/C/D   1   B          B  8 

任何幫助將非常感激。

+0

具有通過#2,和'data.table'與標籤'[R] A搜索''subset' –

+0

等你可以用R子集數據??? –

+0

這很好,你試圖分解過程,但我建議你發佈一個通用的問題或更好地解釋你的數據。您目前的動詞術語和用法非常混亂。 – TheComeOnMan

回答

2

這是一個去吧。其中大部分是通過%in%創建變量或匹配多個結果和子集。我陷入了最後一步,所以一個循環很容易。我註釋了一下代碼,以顯示我在做什麼。

請注意,所有這些都是通過在data.frame中使用stringsAsFactors = FALSE來處理字符向量。我不確定爲什麼你的數字向量都是作爲字符向量輸入的,但是如果這不是你的實際數據集,你可以避免需要as.numeric

require(plyr) 
# create the supra needed when pos.supra is 1 or not 
df1 = ddply(df.1, .(location), transform, 
     needed = ifelse(pos.supra == 1, supra[pos.supra == 2], supra[pos.supra == 1])) 

# break apart the teams into lists for team, team.B, needed 
    # the result is a list 
# strsplit needs character vectors, not factors 
team = strsplit(df1$team, "/") 
teamb = strsplit(df1$team.B, "/") 
needs = strsplit(as.character(df1$needed), "/") 

# pull out everything in team b that's not in team 
b.not.team = mapply(function(x, y) x[!x %in% y], teamb, team) 

# now match needed supra and everything in team b but not team and 
    # paste together the results with a comma between and put in df1 
df1$bneeded = mapply(function(x, y) paste0(x[x %in% y], collapse = ","), needs, b.not.team) 


for (i in 1:nrow(df1)){ 
    matchto = unlist(strsplit(df1$bneeded[i], ",")) 
    diffs = as.numeric(df1$value.X[df1$unit %in% matchto]) - 
     as.numeric(df1$value.Y[df1$unit %in% matchto]) 
    df1$delta[i] = sum(diffs) 
} 

df1$bneeded[df1$bneeded == ""] = NA 
df1$delta[df1$delta == 0] = NA 
df1 

**編輯for循環替代** 下面是一個替代的環路來創建x和y之間的差異。有時你需要的只是一個新的早晨,意識到你的代碼有什麼問題。 ;)在很多情況下我喜歡循環,因爲它可以很容易地讀取代碼中正在發生的事情。在這種情況下,我在代碼的其餘部分使用了mapply,因此這裏是mapply選項。

df1$diffxy = mapply(function(x, y) sum(as.numeric(df1$value.X[x %in% y])) - 
        sum(as.numeric(df1$value.Y[x %in% y])), 
     df1["unit"], strsplit(df1$bneeded, ",")) 
相關問題