我有一個包含產品信息(如ID,描述和類別(以及許多其他變量))的主數據框。使用R中另一個數據幀的變量完成數據幀
main.df <- structure(list(product.ID = 1:10,
description = c("abc...", "bcd...", "def...", "efg...", "fgh...",
"ghi...", "hij...", "ijk...", "jkl...", "klm..."),
category = c("a", "b", "c", "d", "e", "a", "b", "c", "d", "e")),
.Names = c("product.ID", "description", "category"),
row.names = c(NA, -10L), class = "data.frame")
然後,我有一個第二數據幀,它列出了類產品的每一個特定的類別屬:
classes.df <- structure(list(category = c("a", "b", "c", "d", "e"),
classe = c("aaa", "bbb", "aaa", "ccc", "bbb")),
.Names = c("category", "classe"),
row.names = c(NA, -5L),
class = "data.frame")
「類別」的變量是什麼「鏈接」的2個的數據幀。
我需要在main.df中添加一個變量來提及每行所屬的產品類,但是我不知道如何去做。
考慮到我的實際main.df是4.5萬行遍佈90,000多個類別,我的實際classes.df有90,000行對應120個類,我該怎麼做。 謝謝。
main.df結構
Classes ‘data.table’ and 'data.frame': 250000 obs. of 16 variables:
$ ID : int 4722 6988 9184 13224 13511 15938 19244 21162 23294 23793 ...
$ dataset : Factor w/ 2 levels "BA", "RB",..: 1 1 1 1 1 1 1 1 1 1 ...
$ prodID : num 429 429 429 429 429 429 429 429 429 429 ...
$ ProdName : chr "aaa" "aaa" "bbb" "ccc" "eee" ...
$ manufacID : num 1 1 1 1 1 1 1 1 1 1 ...
$ time : num 1271636264 1062977828 1218368958 1305424000 1284596323 ...
$ serial : chr "BA1" "BA1" "RB1" "RB7" ...
- attr(*, "sorted")= chr "serial"
- attr(*, ".internal.selfref")=<externalptr>
classes.df結構:
Classes ‘data.table’ and 'data.frame': 20565 obs. of 5 variables:
$ ID : int 652 1204 1252 1379 2334 2335 2336 2337 3186 3187 ...
$ mName : chr "XYZ" "EHD" "DLK" "TSH" ...
$ country: chr "Argentina" "USA" "UK" "Argentina" ...
$ serial : chr "RB7" "BA1" "RB97" "RB732" ...
- attr(*, ".internal.selfref")=<externalptr>
(出於保密原因,我不得不匿名化的名稱)
如果我理解正確的話,你想用'serial'列作爲鏈接變量。但是,有'ID'列也是一個常見變量。在預期的結果中,'main.df'有多少列? – akrun