生成R中一個新的變量，其中第n個觀測依賴於另一列

假設我有一個數據幀的第n-1的觀察，看起來是這樣的：生成R中一個新的變量，其中第n個觀測依賴於另一列

>df 
city year ceep 
    1 1  1 
    1 2  1 
    1 3  0 
    1 4  1 
    1 5  0 
    2 1  0 
    2 2  1 
    2 3  1 
    2 4  0 
    2 5  1 
    3 1  1 
    3 2  0 
    3 3  1 
    3 4  0 
    3 5  1

現在我想創建一個新的變量'veep'依賴於來自不同行的'city'和'ceep'的值。例如，

veep=1 if ceep[_n-1]=1 & city=city[_n-1] 
veep=1 if ceep[_n+2]=1 & ceep[_n+3]=1 & city=city[_n+3]

其中n是觀察行。我不確定如何將這些條件轉換爲R語言。我想我遇到麻煩的是選擇觀察行。我正在考慮代碼沿線的代碼：

df$veep[df$ceep(of the n-1th observation)==1 & city==city(n-1th observ.)] <- 1 
df$veep[df$ceep(of the n+2th observation)==1 & df$ceep(of the n+3th observation)==1 & 
city==city(n+3th observ.)] <- 1 

#note: what's in parentheses is just to demonstrate where I'm having trouble

任何人都可以提供幫助嗎？

來源

2012-11-29 econlearner

您可以使用一個for循環這樣

df$veep <- 0 

for (i in seq(nrow(df))){ 
if (i > 1 & i < nrow(df)-2){ 
    if (df[i-1,"ceep"]==1 & df[i-1,"city"] == df[i,"city"]) 
     df[i,"veep"] <- 1 
} 
}

來源

2012-11-29 13:24:10

row'nrow（df） - 1'？ – BenBarnes

這裏寫出來你的邏輯步驟的一種方式。請注意使用idx來索引向量。這是避免超出範圍索引所必需的。

idx <- seq_len(nrow(df)) 

# Set a default value for the new variable 
df$veep <- NA

你的第一套邏輯標準不能被應用到df第一排，因爲索引n - 1會0，這不是有效的行索引。因此，請使用tail(*, -1)挑選除veep和city的第一個條目之外的所有條目，並使用head(*, -1)挑選除ceep和city之外的所有條目。

df[tail(idx, -1), "veep"] <- ifelse(
    head(df$ceep, -1) == 1 & 
    tail(df$city, -1) == head(df$city, -1), 
    1, tail(df$veep, -1))

你的條件下一組不能應用到最後三排df，因爲n + 3將被無效的索引。所以再次使用head和tail函數。一個棘手的部分是，第一個ceep聲明基於n + 2而不是n + 3，因此需要組合head和tail。

df[head(idx, -3), "veep"] <- ifelse(
    head(tail(df$ceep, -2), -1) == 1 & 
    tail(df$ceep, -3) == 1 & 
    head(df$city, -3) == tail(df$city, -3), 
    1, head(df$veep, -3)) 

> df$veep 
[1] NA 1 1 NA 1 NA NA 1 1 NA NA 1 NA 1 NA

來源

2012-11-29 13:45:37 BenBarnes

漂亮，但對新手來說不難？ – agstudy

@agstudy，你很可能是對的。也許有些'splaining是爲了。 – BenBarnes

@econlearner，我已經評論了上面的代碼來解釋'head'和'tail'位。 – BenBarnes

生成R中一個新的變量，其中第n個觀測依賴於另一列

回答

相關問題