我在R中有一個不同國家的面板,我想基於特定變量的值創建類別(在本例中爲'var3')在特定的一年(這裏是3)。根據R中特定年份變量的值對面板中的組進行分類
什麼,我現在有一個例子:
# create data
test.data = as.data.frame(matrix(rexp(200, rate=.1), ncol=5))
colnames(test.data) = c("year", "country", "var1", "var2", "var3")
test.data$year = rep.int(1:5, 8)
test.data$country = rep(1:8, each=5)
# calculate median, minimum and maximum of 'var3'
median = quantile(x = test.data[test.data$year == 3, 5], probs = c(0.5))
min = min(test.data[test.data$year == 3, 5])
max = max(test.data[test.data$year == 3, 5])
# create category variable based on values of 'var3'
test.data$cat.1 = cut(test.data$var3, c(min, median, max))
在這種情況下的「cat.1」的值取決於相應的觀察「VAR3」的價值,但我想它取決於特定國家特定年份的價值(即我希望特定國家所有年份的價值相同)。有沒有一個簡單的方法來做到這一點,或者我必須手動做到這一點(選擇每個組的國家和分配給他們的價值觀)。如果組數不變,手動設置就可以,但如果您想嘗試不同的組大小,則會有點麻煩。
目前的結果看起來如下:
year country var1 var2 var3 cat.1
1 1 1 4.4206363 9.32628504 4.0988089 (1.2,6.71]
2 2 1 7.6072491 6.30949828 39.5694414 <NA>
3 3 1 3.3774183 7.94397550 8.8419793 (6.71,22.2]
4 4 1 1.0300372 9.93858310 0.4908481 <NA>
5 5 1 6.4514008 2.10367840 29.6052797 <NA>
6 1 2 8.7609877 5.76332181 17.4117561 (6.71,22.2]
7 2 2 6.1253021 0.17258071 23.9096280 <NA>
8 3 2 48.3335241 1.19255084 3.3644827 (1.2,6.71]
9 4 2 34.1683821 10.98216846 29.0255100 <NA>
10 5 2 15.5824154 2.53484781 16.3466249 (6.71,22.2]
但我想這個代替:
year country var1 var2 var3 cat.1
1 1 1 4.4206363 9.32628504 4.0988089 (6.71,22.2]
2 2 1 7.6072491 6.30949828 39.5694414 (6.71,22.2]
3 3 1 3.3774183 7.94397550 8.8419793 (6.71,22.2]
4 4 1 1.0300372 9.93858310 0.4908481 (6.71,22.2]
5 5 1 6.4514008 2.10367840 29.6052797 (6.71,22.2]
6 1 2 8.7609877 5.76332181 17.4117561 (1.2,6.71]
7 2 2 6.1253021 0.17258071 23.9096280 (1.2,6.71]
8 3 2 48.3335241 1.19255084 3.3644827 (1.2,6.71]
9 4 2 34.1683821 10.98216846 29.0255100 (1.2,6.71]
10 5 2 15.5824154 2.53484781 16.3466249 (1.2,6.71]
你可能想看看'dplyr ::: group_by'連接到'dplyr ::: mutate'。 – coffeinjunky
感謝您的提示! –
您可以用表格的形式創建預期的還是希望的,而不是用文字描述它? – user5249203