2015-05-29 47 views
1

我試圖重寫這段代碼(學習這種做法),用%>%操作:重寫代碼%>%操作

library(arules) 
data(AdultUCI) #https://archive.ics.uci.edu/ml/datasets/Census+Income 

AdultUCI[["capital-gain"]] <- ordered(cut(AdultUCI[["capital-gain"]], 
+ c(-Inf, 0, median(AdultUCI[["capital-gain"]][AdultUCI 
+ [["capital-gain"]] > 0]), Inf)), 
+ labels = c("None", "Low", "High")) 

是否有可能呢?這裏是我的嘗試:

AdultUCI[["capital-gain"]] <- ordered %>% cut %>% AdultUCI[["capital-gain"]], 
          + c(-Inf, 0, median(AdultUCI[["capital-gain"]][AdultUCI[["capital-gain"]] > 0]), 
          + Inf),labels = c("None", "Low", "High") 
+5

請讓你的代碼** [重複性(http://stackoverflow.com/a/28481250/2725969)**。 – BrodieG

+1

一般而言,您幾乎總是可以用管道運算符替換嵌套函數。你有什麼嘗試?它沒有工作?有什麼問題? – Molx

+0

@Molx我解決這個漫長的操作有問題。訂單是正確的?大部分%>%? – Kulis

回答

1

這應該工作:

library(dplyr) 

#reproducible data 
AdultUCI <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=FALSE) 
colnames(AdultUCI)[13] <- "capital-gain" 

#original code 
originalOrdered <- 
    ordered(cut(AdultUCI[["capital-gain"]], 
       c(-Inf, 0, 
       median(AdultUCI[["capital-gain"]][AdultUCI[["capital-gain"]] > 0]), Inf), 
       labels = c("None", "Low", "High")), 
      levels = c("None", "Low", "High")) 

#using dplyr 
newOrdered <- 
    AdultUCI %>% 
    select(x=`capital-gain`) %>% 
    mutate(capitalGainOrdered= 
      ordered(
      cut(x,c(-Inf, 0, median(x[x > 0]), Inf), 
       labels = c("None", "Low", "High")), 
      levels = c("None", "Low", "High"))) %>% 
    .$capitalGainOrdered 


#test if same 
identical(originalOrdered,newOrdered) 
#[1] TRUE 

str(newOrdered) 
#Ord.factor w/ 3 levels "None"<"Low"<"High": 2 2 2 2 2 2 2 3 3 2 ...