R嵌套for循環迭代行和列名

Here's a Dropbox link to .csv of my data。

我有從1990年至2010年的國家的數據。我的數據很廣泛：每個國家都是一排，每年有兩列對應兩個數據源。但是，一些國家的數據並不完整。例如，一個國家行在1990 - 1995年的列中可能有NA值。

我想創建兩列，並且對於每個國家行，我希望這些列中的值爲兩個數據類型中每一個的最早非NA 值。

我還想創建兩個其他列，並且對於每個國家/地區行，我希望這些列中的值是這兩種數據類型中最早的非NA 年。

所以最後四列會是這樣的：

1990, 12, 1990, 87 
1990, 7, 1990, 132 
1996, 22, 1996, 173 
1994, 14, 1994, 124

這裏是我想象中的嵌套的循環我粗略半僞代碼的企圖將如下所示：

for i in (number of rows){ 
    for j in names(df){ 
    if(is.na(df$j) == FALSE) df$earliest_year = j 
    } 
}

如何我可以生成這些所需的四列嗎？謝謝！

來源

2017-04-09 Jim

您提到的循環;所以我試着做一個for-loop。但你可能想嘗試其他的R函數，比如稍後申請。此代碼是一個有點冗長，希望這可以幫助你：

# read data; i'm assuming the first column is row name and not important 
df <- read.csv("wb_wide.csv", row.names = 1) 

# get names of columns for the two datasource 
# here I used grep to find columns names using NY and SP pattern; 
# but if the format is consistentto be alternating, 
# you can use sequence of number 
dataSourceA <- names(df)[grep(x = names(df), pattern = "NY")] 
dataSourceB <- names(df)[grep(x = names(df), pattern = "SP")] 

# create new columns for the data set 
# if i understand it correctly, first non-NA data from source 1 
# and source 2; and then the year of these non-NAs 
df$sourceA <- vector(length = nrow(df)) 
df$yearA <- vector(length = nrow(df)) 
df$sourceB <- vector(length = nrow(df)) 
df$yearB <- vector(length = nrow(df)) 

# start for loop that will iterate per row 
for(i in 1:nrow(df)){ 

    # this is a bit nasty; but the point here is to first select columns for source A 
    # then determine non-NAs, after which select the first and store it in the sourceA column 
    df$sourceA[i] <- df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]] 

    # another nasty one; but I used gsub to clean the column name so that the year will be left 
    # you can also skip this and then just clean afterward 
    df$yearA[i] <- gsub(x = names(df[i, dataSourceA][which(!is.na(df[i , dataSourceA]))[1]]), 
       pattern = "^.*X", replacement = "") 

    # same with the first bit of code, but here selecting from source B 
    df$sourceB[i] <- df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]] 

    # same with the second bit for source B 
    df$yearB[i] <- gsub(x = names(df[i, dataSourceB][which(!is.na(df[i , dataSourceB]))[1]]), 
       pattern = "^.*X", replacement = "") 

}

我試圖使代碼具體到你的榜樣，並希望輸出。

來源

2017-04-09 02:17:32 din

這太棒了！非常感謝！！非常有幫助的解釋。 – Jim

R嵌套for循環迭代行和列名

回答

相關問題