我具有類似於該樣品的數據幀:根據在兩列我要通過大小和顏色的項進行分類的信息優化:值替換在數據幀wiith多個條件
df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L))
。輸出應該是這樣的:
structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100), Class = c("small red ball", "small red ball", "small blue ball", "medium red ball", "medium blue ball", "big red ball")), row.names = c(NA, -6L), .Names = c("Ball", "size", "Class"), class = "data.frame")
我已經運行的代碼,但是它很長,混亂的,我相信有一種更簡潔的方式讓我所需的輸出。
那麼我做了什麼?
我開始選擇第一類的項目和重命名選定df$Class
值:
df["Class"] <- NA #add new column
df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"
因爲我grepl選擇有時是空的,我加了if (length() > 0
)條件:
if (length(df[grepl("red", df$Ball) & df$size <10, ]$Class) > 0) {df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"}
最後我結合我在一個循環中的所有選擇
df["Class"] <- NA #add new column
z <- c("red", "blue")
for (i in z){
if (length(df[grepl(i, df$Ball) & df$size <10, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size <10, ]$Class <- paste("small", i, "ball", sep=" ")}
if (length(df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class <- paste("medium", i, "ball", sep=" ")}
if (length(df[grepl(i, df$Ball) & df$size >=100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=100, ]$Class <- paste("big", i, "ball", sep=" ")}
}
它適用於兩種顏色和三種尺寸類別,但我的原始數據框要大得多。因此,(因爲它看起來非常混亂),我的問題: 我該如何簡化我的代碼?
我沒有看到'stringr'包的本質。我猜base r的工作原理是:'paste(as.character(cut(df $ size,c(1,10,100,Inf),c(「small」,「medium」,「large」))), sub(「 [^(red | blue)]。*「,」「,df $ Ball),'Ball')' – Onyambu
@Onyambu確定'sub'有效,但如果沒有匹配,那麼它可以返回整個字符串因爲'str_extract'返回NA。一個解決方法是'regexpr/regmatches' – akrun
對於small:x <10','medium 10 <= x <100','large:x>應該是'c(1,9,99,Inf) = 100',對嗎? – Iris