2017-04-09 51 views
0

我是R新手,所以請原諒基本問題。R嵌套for循環迭代行和列名

Here's a Dropbox link to .csv of my data。

我有從1990年至2010年的國家的數據。我的數據很廣泛:每個國家都是一排,每年有兩列對應兩個數據源。但是,一些國家的數據並不完整。例如,一個國家行在1990 - 1995年的列中可能有NA值。

我想創建兩列,並且對於每個國家行,我希望這些列中的值爲兩個數據類型中每一個的最早非NA

我還想創建兩個其他列,並且對於每個國家/地區行,我希望這些列中的值是這兩種數據類型中最早的非NA

所以最後四列會是這樣的:

1990, 12, 1990, 87 
1990, 7, 1990, 132 
1996, 22, 1996, 173 
1994, 14, 1994, 124 

這裏是我想象中的嵌套的循環我粗略半僞代碼的企圖將如下所示:

for i in (number of rows){ 
    for j in names(df){ 
    if(is.na(df$j) == FALSE) df$earliest_year = j 
    } 
} 

如何我可以生成這些所需的四列嗎?謝謝!

回答

2

您提到的循環;所以我試着做一個for-loop。但你可能想嘗試其他的R函數,比如稍後申請。此代碼是一個有點冗長,希望這可以幫助你:

# read data; i'm assuming the first column is row name and not important 
df <- read.csv("wb_wide.csv", row.names = 1) 

# get names of columns for the two datasource 
# here I used grep to find columns names using NY and SP pattern; 
# but if the format is consistentto be alternating, 
# you can use sequence of number 
dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")] 
dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")] 

# create new columns for the data set 
# if i understand it correctly, first non-NA data from source 1 
# and source 2; and then the year of these non-NAs 
df$sourceA <- vector(length = nrow(df)) 
df$yearA <- vector(length = nrow(df)) 
df$sourceB <- vector(length = nrow(df)) 
df$yearB <- vector(length = nrow(df)) 

# start for loop that will iterate per row 
for(i in 1:nrow(df)){ 

    # this is a bit nasty; but the point here is to first select columns for source A 
    # then determine non-NAs, after which select the first and store it in the sourceA column 
    df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]] 

    # another nasty one; but I used gsub to clean the column name so that the year will be left 
    # you can also skip this and then just clean afterward 
    df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]), 
       pattern = "^.*X", replacement = "") 

    # same with the first bit of code, but here selecting from source B 
    df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]] 

    # same with the second bit for source B 
    df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]), 
       pattern = "^.*X", replacement = "") 

} 

我試圖使代碼具體到你的榜樣,並希望輸出。

+0

這太棒了!非常感謝!!非常有幫助的解釋。 – Jim