2
請參閱下面的可再現(剪切+粘貼)示例。實際數據集對11000人進行了4000多次串行觀測。我需要創建列A,B,C等,顯示與「疾病」變量的特定值的第一次出現相對應的「藥物」變量X,Y,Z等的NUMBER。這些數字是指對特定藥物採取的行動(開始,停止,增加劑量等)。「疾病」變量指的是疾病是否在包括耀斑和緩解的許多階段的疾病中發作。根據列名稱在一個數據幀中查找值作爲另一個數據幀中的值存儲
例如:
Animal <- c("aardvark", "1", "cheetah", "dromedary", "eel", "1", "bison", "cheetah", "dromedary",
"eel")
Plant <- c("apple_tree", "blossom", "cactus", "1", "bronze", "apple_tree", "bronze", "cactus",
"dragonplant", "1")
Mineral <- c("amber", "bronze", "1", "bronze", "emerald", "1", "bronze", "bronze", "diamond",
"emerald")
Bacteria <- c("acinetobacter", "1", "1", "d-strep", "bronze", "acinetobacter", "bacillus",
"chlamydia", "bronze", "enterobacter")
AnimalDrugA <- c(1, 11, 12, 13, 14, 15, 16, 17, 18, 19)
AnimalDrugB <- c(20, 1, 22, 23, 24, 25, 26, 27, 28, 29)
PlantDrugA <- c(301, 302, 1, 304, 305, 306, 307, 308, 309, 310)
PlantDrugB <- c(401, 402, 1, 404, 405, 406, 407, 408, 409, 410)
MineralDrugA <- c(1, 2, 3, 4, 1, 6, 7, 8, 9, 10)
MineralDrugB <- c(11, 12, 13, 1, 15, 16, 17, 18, 19, 20)
BacteriaDrugA <- c(1, 2, 3, 4, 5, 6 , 7, 8, 9, 1)
BacteriaDrugB <- c(10, 9, 8, 7, 6, 5, 4, 3, 2, 1)
dummy_id <- c(1001, 2002, 3003, 4004, 5005, 6006, 7007, 8008, 9009, 10101)
Elements <- data.frame(dummy_id, Animal, Plant, Mineral, Bacteria, AnimalDrugA, AnimalDrugB,
PlantDrugA, PlantDrugB, MineralDrugA, MineralDrugB, BacteriaDrugA, BacteriaDrugB)
ds <- Elements[,order(names(Elements))]
ds #Got it in alphabetical order... The real data set will be re-ordered chronologically
#Now I want the first occurrence of the word "bronze" for each id
# for each subject 1 through 10. (That is, "bronze" corresponds to start of disease flare.)
first.bronze <- colnames(ds)[apply(ds,1,match,x="bronze")]
first.bronze
#Now, I want to find the number in the DrugA, DrugB variable that corresponds to the first
#occurrence of bronze.
#Using the alphabetically ordered data set, the answer should be:
#dummy_id DrugA DrugB
#1... NA NA
#2... 2 12
#3... NA NA
#4... 4 1
#5... 5 6
#6... NA NA
#7... 7 17
#8... 8 18
#9... 9 2
#10... NA NA
#Note that all first occurrences of "bronze"
# are in Mineral or Bacteria.
#As a first step, join first.bronze to the ds
ds$first.bronze <- first.bronze
ds
#Make a new ds where those who have an NA for first.bronze are excluded:
ds2 <- ds[complete.cases(ds$first.bronze),]
ds2
# Create a template data frame
out <- data.frame(matrix(nr = 1, nc = 3))
colnames(out) <- c("Form Number", "DrugA", "DrugB") # Gives correct column names
out
#Then grow the data frame...yes I realize potential slowness of computation
test <- for(i in ds2$first.bronze){
data <- rbind(colnames(ds2)[grep(i, names(ds2), ignore.case = FALSE, fixed = TRUE)])
colnames(data) <- c("Form Number", "DrugA", "DrugB") # Gives correct column names
out <- rbind(out, data)
}
out
#Then delete the first row of NAs
out <- na.omit(out)
out
#Then add the appropriate dummy_ids
dummy_id <- ds2$dummy_id
out_with_ids <- as.data.frame(cbind(dummy_id, out))
out_with_ids
現在我卡住了。我將ds2列的名稱列爲out_with_ids數據集中藥物A,藥物B的值。我已經徹底搜索了堆棧溢出,但基於匹配,合併,替換和data.table包的解決方案似乎不起作用。
謝謝!
嗨,剪切+粘貼示例+1。但是,如果您可以請簡化問題,這將有助於我們更快地發佈答案 –
我會盡量簡化:基本上df1包含一些變量,其值是在df2中找到的變量的名稱。我需要用df2中的匹配變量名稱下的實際值替換df1中這些變量的值。 – user3108800