循環顯示唯一值R

我之前發佈的a loop question正在嘗試另一個循環，但沒有成功。試圖解決這個問題的幫助將不勝感激。截至目前，爲了完成我的工作，我將按年份對數據進行子集劃分，並按原樣運行我的原始函數，但我正在使用的其中一個數據集是一個很長的時間系列。我的原始函數計算給定年份數據集的年齡魚的數量。此功能正常工作。我想要做的是添加一個for循環，該循環將允許函數循環所有年份並提供相同的信息。循環顯示唯一值R

數據：

x <- structure(list(Year = c(2007, 2012, 2012, 2007, 2012, 2007, 2012, 
2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 
2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 
2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 
2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 
2007, 2012, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012, 2012, 
2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012, 
2012, 2012, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012 
), Season = c("Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", 
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall"), Length = c(6, 
9, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 
18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 
26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33, 
34, 34, 35, 35, 36, 37, 37, 38, 38, 39, 40, 40, 41, 42, 42, 43, 
43, 44, 44, 45, 45, 46, 47, 47, 48, 49, 50, 51, 51, 52, 52, 53, 
54, 55, 58), Exp_number = c(2, 1, 3, 2, 2, 6, 4, 11, 6, 24, 13, 
38, 41.208, 26, 77.096, 37, 227.704, 41, 276.064, 20, 276.536, 
23, 277.008, 23, 72.832, 11, 66.096, 8, 43.888, 12, 13.472, 14, 
2, 14, 4, 8, 4, 10, 5, 12, 2, 13, 5, 9, 2, 7, 1, 4, 3, 2, 2, 
8, 2, 3, 2, 1, 3, 2, 5, 1, 8, 2, 2, 2, 1, 6, 1, 2, 1, 1, 4, 1, 
3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1)), .Names = c("Year", "Season", 
"Length", "Exp_number"), row.names = c(8L, 43L, 55L, 64L, 68L, 
75L, 78L, 86L, 91L, 98L, 103L, 110L, 115L, 120L, 125L, 131L, 
136L, 143L, 148L, 157L, 162L, 169L, 174L, 181L, 186L, 193L, 197L, 
206L, 211L, 220L, 225L, 234L, 238L, 247L, 252L, 260L, 265L, 274L, 
279L, 288L, 293L, 302L, 307L, 316L, 320L, 329L, 334L, 343L, 346L, 
355L, 360L, 368L, 371L, 383L, 392L, 395L, 404L, 409L, 422L, 430L, 
435L, 447L, 456L, 461L, 468L, 472L, 480L, 483L, 491L, 495L, 505L, 
512L, 516L, 527L, 537L, 545L, 550L, 553L, 558L, 562L, 565L, 568L, 
571L, 583L), class = "data.frame") 


y <- structure(c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 4, 5, 30, 29, 28, 
17, 8, 8, 6, 16, 26, 59, 46, 77, 89, 78, 64, 51, 34, 31, 27, 
19, 30, 21, 26, 13, 26, 18, 12, 8, 9, 12, 9, 6, 13, 12, 20, 10, 
14, 14, 11, 8, 10, 13, 7, 6, 4, 4, 8, 2, 2, 4, 3, 0, 2, 0, 1, 
1, 1, 1, 1, 1, 0.941176470588235, 0.875, 0.625, 0.666666666666667, 
0.375, 0.423076923076923, 0.423728813559322, 0.391304347826087, 
0.246753246753247, 0.235955056179775, 0.153846153846154, 0.203125, 
0.0980392156862745, 0.0882352941176471, 0, 0.037037037037037, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0588235294117647, 
0.125, 0.375, 0.333333333333333, 0.5625, 0.5, 0.457627118644068, 
0.478260869565217, 0.545454545454545, 0.561797752808989, 0.564102564102564, 
0.59375, 0.647058823529412, 0.411764705882353, 0.483870967741935, 
0.222222222222222, 0.157894736842105, 0.0666666666666667, 0.0476190476190476, 
0.0384615384615385, 0.0769230769230769, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0625, 0.0769230769230769, 
0.11864406779661, 0.130434782608696, 0.207792207792208, 0.191011235955056, 
0.269230769230769, 0.171875, 0.176470588235294, 0.5, 0.483870967741935, 
0.481481481481481, 0.736842105263158, 0.8, 0.619047619047619, 
0.576923076923077, 0.615384615384615, 0.423076923076923, 0.277777777777778, 
0.333333333333333, 0.25, 0.111111111111111, 0.166666666666667, 
0.111111111111111, 0, 0, 0, 0, 0, 0.142857142857143, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0.0112359550561798, 0.0128205128205128, 
0.03125, 0.0784313725490196, 0, 0.032258064516129, 0.185185185185185, 
0.105263157894737, 0.1, 0.285714285714286, 0.269230769230769, 
0.307692307692308, 0.5, 0.5, 0.583333333333333, 0.625, 0.555555555555556, 
0.666666666666667, 0.444444444444444, 0.5, 0.538461538461538, 
0.333333333333333, 0.25, 0.1, 0, 0.214285714285714, 0, 0.125, 
0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.037037037037037, 
0, 0.0333333333333333, 0.0476190476190476, 0.115384615384615, 
0, 0.0769230769230769, 0.166666666666667, 0.0833333333333333, 
0.125, 0.222222222222222, 0.166666666666667, 0.444444444444444, 
0.333333333333333, 0.230769230769231, 0.333333333333333, 0.4, 
0.9, 0.571428571428571, 0.357142857142857, 0.545454545454545, 
0.5, 0.4, 0.230769230769231, 0.142857142857143, 0.333333333333333, 
0.25, 0, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.037037037037037, 0, 
0, 0, 0, 0, 0, 0.0555555555555556, 0, 0, 0.111111111111111, 0, 
0, 0.166666666666667, 0.230769230769231, 0.25, 0.25, 0, 0.214285714285714, 
0.214285714285714, 0.272727272727273, 0, 0.4, 0.461538461538462, 
0.285714285714286, 0, 0.75, 0.5, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0833333333333333, 
0.1, 0, 0.0714285714285714, 0.142857142857143, 0.0909090909090909, 
0.125, 0.1, 0.307692307692308, 0.285714285714286, 0.333333333333333, 
0, 0.25, 0.25, 0, 1, 0.5, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0.0909090909090909, 
0, 0, 0, 0.285714285714286, 0.333333333333333, 0, 0, 0, 0, 0, 
0.25, 0.333333333333333, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0.25, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.5, 0, 0.25, 
0.666666666666667, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0, 
0, 0, 0, 0.5, 0, 1), .Dim = c(57L, 13L), .Dimnames = list(NULL, 
    c("len", "nl", "A0", "A1", "A2", "A3", "A4", "A5", "A6", 
    "A7", "A8", "A9", "A10"))) 

z <- structure(list(Length = c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)), .Names = "Length", row.names = c(NA, 
-57L), class = "data.frame")

原有的功能 - 集X爲一年來運行。

subx <- subset(x,Year==2007) 

myfunction <- function (x,y,z) { 
    #x - expanded length data by year and season 
    #y - alk as a percent by season 
    #z - length bins from alk percent data 

    #merge z and y to make sure both data sets have the same number of length bins 
    comb<-merge(z,x,by="Length",all=T) 

    #replace NA by column 
    comb$Year[is.na(comb$Year)]<-unique(x$Year) 
    comb$Exp_number[is.na(comb$Exp_number)] <- 0 

    #catch at age 
    catchatage<-comb[,4]*y[,3:13] 
    columntotal<-colSums(catchatage[1:dim(catchatage)[1],1:dim(catchatage)[2]]) 

    #assign and combine age, number of fish at age and year 
    Number=as.data.frame(columntotal) 
    Age=rownames(Number);Age<-as.numeric(gsub("A","",Age)) #get rid of the letter A in age 
    Year=rep(unique(x$Year),length(columntotal)) 

    age<-as.data.frame(cbind(Year,Age,Number=Number[,1])) 

    #reorder age 
    age<-age[order(age[,2]),] 
    return(age) 
} 
myfunction(subx,y,z)

我的函數帶有循環的獨特年份值 - 使用整個x數據集。

myfunction_2 <- function (x,y,z) { 
    #x - expanded length data by year and season 
    #y - alk as a percent by season 
    #z - length bins from alk percent data 

    #loop through years in survey 
    #get unique year values from combined year season dataset 
    y_levels<-unique(x$Year) 

    for (i in length(y_levels)){ 

    #subset the data 
    subset_data<-x$Year==y_levels[i] 

    #merge z and y to make sure both data sets have the same number of length bins 
    comb<-merge(z,subset_data,by="Length",all=T) 

    #replace NA by column 
    comb$Year[is.na(comb$Year)]<-unique(x$Year[i]) 
    comb$Exp_number[is.na(comb$Exp_number)] <- 0 

    #catch at age 
    catchatage<-comb[,4]*y[,3:13] 
    columntotal<-colSums(catchatage[1:dim(catchatage)[1],1:dim(catchatage)[2]]) 

    #assign and combine age, number of fish at age and year 
    Number=as.data.frame(columntotal) 
    Age=rownames(Number);Age<-as.numeric(gsub("A","",Age)) #get rid of the letter A in age 
    Year=rep(unique(subset_data$Year[i]),length(columntotal)) 

    age<-as.data.frame(cbind(Year,Age,Number=Number[,1])) 

    #reorder age 
    age<-age[order(age[,2]),] 

    return(age) 

    } 
} 

myfunction_2(x,y,z)

我收到的錯誤信息是：

錯誤fix.by（by.y，Y）： '通過' 必須

所以指定一個唯一地有效的列我認爲我的循環不能將數據按獨特的年份分組。

謝謝。

來源

2014-07-16 user41509

是你可以選擇在函數外部使用'merge'並且傳遞'comb'對象嗎？然後你可以在函數裏面加入'comb'的子集。 –

不 - 我確實已經嘗試過，但是當我在函數之外進行合併時，並非所有長度框都添加了所有年份。由於至少有一年有一個長度倉，所以合併函數會合並它。當我按年份分組數據時，合併函數可以工作，因爲並非所有的長度箱都在給定的年份，所有使用= T添加到丟失的箱中。 – user41509

'catchatage [1：dim（catchatage）[1]，1：dim（catchatage）[2]]'與'catchatage'相同 - 爲什麼不刪除下標並且只有'colSums（catchatage）' – konvas

有跡象表明，應順序，可以更改此代碼工作幾件事：

subset_data<-x$Year==y_levels[i]實際沒有定義的一個子集，它應該是subset_data <- subset(x, Year==y_levels[i])
，for (i in length(y_levels))應該for (i in 1:length(y_levels)) ，否則該循環將僅適用於2012年
return(age)在循環內部時它應該在外
每個迭代的結果不合並

糾正第1點後，應該更容易糾正其餘部分。

另一個建議：你的第二個函數應該使用第一個函數，代碼會更容易閱讀。

最後，另一種方式來循環無「爲」的聲明將是使用的lapply（或者更確切地說do.call，看到@konvas'註釋）：

do.call(rbind, lapply(unique(x$Year), function(yy) myfunction(subset(x,Year==yy),y,z)))

來源

2014-07-16 15:36:24

用'rbind'使用'Reduce'並不是一個好主意（非常慢），但它可以被'do.call（rbind，...）替換以獲得更好的性能。 – konvas

這是一個非常有用的提示，爲什麼？（我根據此評論編輯了我的代碼） –

這是因爲'Reduce'會多次應用'rbind'，但'do.call'只會調用一次。例如，如果你在一個列表中有三個數據幀，那麼'Reduce'就像調用'rbind（rbind（d1，d2），d3）'，但do.call'是像'rbind（d1，d2，d3）'。 – konvas

雖然你的代碼應該工作（固定後錯誤建議@VincentGuillemot），你可以讓myfunction更具可讀性，如果你想。我認爲從一開始就修改你的起始數據框是值得的，包括Length和Year的每個組合（而不是在myfunction，一次一年）。

所以，既然你以後有什麼是所有長度爲一年的組合，怎麼樣像

# create a data frame consisting of all length-year combinations 
data <- expand.grid(Length = z$Length, Year = unique(x$Year)) 
data <- merge(data, x, all = TRUE) # merge with x 
data <- merge(data, as.data.frame(y), by.x = "Length", by.y = "len") # merge with y 
data$Exp_number[is.na(data$Exp_number)] <- 0 # set missing Exp_number values to 0

在這個階段，你的數據在一個數據幀（而不是3）和遺漏值已照顧（除了列Season，你似乎不關心）。我發現在d（而不是x,y,z）上執行分析更容易，您可以專注於實際計算，而不是合併數據幀並替換NA s。現在

，你的功能將類似於

myfunction <- function(d) { 
    # d is a subset of data for a given year 
    catchatage <- d$Exp_number * d[grep("^A[0-9]*$", names(d))] 
    Number <- colSums(catchatage) 
    Age <- as.numeric(gsub("A", "", names(Number))) 
    result <- data.frame(Year = d$Year[1], 
     Age = Age, Number = Number) 
    result[order(result$Age), ] 
}

我更喜歡使用的列名和grep，而不是列索引，因爲使用的索引可以的，如果導致是未被發現的錯誤，該數據的順序/結構變化（另一方面，變量的名稱很少改變）。

對data的子集Year應用myfunction，並將結果合併。這可以通過多種方式來完成，使用lapply作爲@VincentGuillemot在他的建議後，或by（或其他一些非基本methonds像plyr，dplyr，data.table如果你有興趣尋找到它們）

do.call(rbind, by(data, list(data$Year), myfunction))

來源

2014-07-16 16:42:46 konvas

感謝文森特Guillemot和konvas給你的建議。我嘗試了這兩種方法，但仍然遇到了原始函數的問題，並且保存了迭代，因此稍後可能會發布。 Konvas你的代碼工作得很好，比我的簡單得多。你能否解釋一下d [grep（「^ A [0-9] * $」，名稱（d））]的含義。我不熟悉這一點。我想知道，因爲我的其他數據集將有不同數量的年齡，我想我需要更改括號之間的範圍來處理這個問題。謝謝 – user41509

'grep'在字符串（或字符串的向量）中搜索一個模式，如果匹配則返回「TRUE」，如果沒有匹配則返回「FALSE」。該模式被稱爲正則表達式，在這種情況下它是「^ A [0-9] * $」。 ^表示字符串的開頭，$表示結束。 [0-9]代表0到9之間的數字，*表示與之前的元素匹配0次或更多次。所以這個正則表達式將匹配所有以A開頭並跟隨數字的字符串。 – konvas

（繼續前面）如果混淆，你可以在這種情況下執行'd [paste0（「A」，1:10）]'，但是使用'grep'可以允許其他數據幀具有更多或更少的年齡列該模式保持不變）。您可以閱讀'？grep'以獲取更多信息，並在互聯網上搜索「正則表達式」。 – konvas

循環顯示唯一值R

回答

相關問題