2016-11-08 73 views
0

下面是工作正常,今天突然想出了警告代碼在df1_Drop [ID_posin1] < - df2_Pick [ID_posin2]項目,以取代數量不是更換長度的倍數

的代碼中使用下面的行

我無法找到爲什麼警告顯示和輸出也不正確。

Cost_match<-match_cost(SingleValueDistribute = F,df1_ID = Net_Rev$`Man.ID`, 
         df2_ID = Production_cost$Man.ID, 
         df2_Pick = Production_cost$Man.Revenue, 
         df1_Drop = Net_Rev$`Man.Revenue`, 
         df1_Dist_by = Net_Rev$Revenue,dist = T,weighted = T) 

警告消息:在df1_Drop [ID_posin1] < - df2_Pick [ID_posin2]:
數項替換的是不替換長度的倍數

match_cost<-function(SingleValueDistribute=F, df1_ID,df2_ID,df2_Pick,df1_Drop,df1_Dist_by,weighted=F,dist=F){ 
    # SingleValueDistribute allows to distribute a single value across many rows 
    # IDs not needed in this case 

    if(SingleValueDistribute==T) { 
    sum<-sum(df1_Dist_by) 
    perc<-df1_Dist_by/sum 
    cost<-df2_Pick 
    df1_Drop<-perc*cost 
    reps<-NULL 
    print(" Singular Value Distributed") 
    }else{             
    df<-data.frame(table(df1_ID)) 
    df<-df[which(df$df1_ID %in% df2_ID),] 
    reps<-as.character(unique(df$df1_ID[which(df$Freq>1)])) 

    if (length(reps)>0 & dist==F) { 
     print("Multiple IDs; Values not Distributed") 
    } else if(length(reps)>0 & dist==T & weighted==T){ 


     for(i in df2_ID){ #Loop to distribued by df1_Dist_by 
     rows<-which(df1_ID==i) 
     sum<-sum(df1_Dist_by[rows]) 
     cost<-df2_Pick[which(df2_ID==i)] 
     if(sum==0){ 
      df1_Drop[rows][1]<-cost 
     }else{ 
      perc<-df1_Dist_by[rows]/sum 
      df1_Drop[rows]<-perc*cost 
     } 
     } 
     print("Multiple IDs; Value Weighted and Distributed") 
    }else{ 
     # Direct matching  
     unique_ID<-unique(df2_ID) #Get unique IDs 

     #Find positions of unique IDs the two data sets 
     ID_posin1<- match(x = unique_ID, table = df1_ID, nomatch = 0) # Find position of unique Ids 
     ID_posin2<- match(x = unique_ID, table = df2_ID, nomatch = 0) # Find position of unique Ids 

     # Find corresponding cost positions 

     df1_Drop[ID_posin1]<-df2_Pick[ID_posin2]     
     if(length(reps)>0) print("Multiple IDs; Values singularly Distributed") else 
     print("Singular IDs; Values Distributed") 
    } 
    } 
    return(list(df1_Drop=df1_Drop,rep_ID=reps)) 
} 

回答

0

你有一個您在初始數據集中沒有看到的兩個數據源之間的數據差異。它看起來像你假設所有df1 ID在df2和反之亦然

unique_ID <- 100:101 
df1_Drop <- 1:3 
df2_Pick <- 4:6 
# Find positions of unique IDs the two data sets 
# Find position of unique Ids 
ID_posin1 <- match(x = unique_ID, table = 101:103, nomatch = 0) 
ID_posin1 
# [1] 0 1 

# Find position of unique Ids 
ID_posin2 <- match(x = unique_ID, table = 100:103, nomatch = 0) 
ID_posin2 
# [1] 1 2 

df1_Drop[ID_posin1] <- df2_Pick[ID_posin2] 
# Warning message: 
# In df1_Drop[ID_posin1] <- df2_Pick[ID_posin2] : 
# number of items to replace is not a multiple of replacement length 

df1_Drop 
# [1] 4 2 3 

養成測試你在功能開始時做出的所有假設的習慣,你將不太可能遇到驚喜!例如:

# check inputs 
if (!all(df1_ID %in% df2_ID)) { stop("not all df1_ID are in df2_ID") } 
if (!all(df2_ID %in% df1_ID)) { stop("not all df2_ID are in df1_ID") } 

編輯:我認爲以下塊沒有你想要的行爲:

# Direct matching  
    unique_ID<-unique(df2_ID) #Get unique IDs 

    #Find positions of unique IDs the two data sets 
    # Find position of unique Ids 
    ID_posin1<- match(x = unique_ID, table = df1_ID, nomatch = 0) 
    # Find position of unique Ids 
    ID_posin2<- match(x = unique_ID, table = df2_ID, nomatch = 0) 

    # Find corresponding cost positions 

    df1_Drop[ID_posin1]<-df2_Pick[ID_posin2] 

您嘗試更新對應的ID記錄,但它是相當危險的做除非您確定每個ID的每個記錄位於相同位置。我不會爲數據集做出這個假設,而是會通過df1中的ID分配數據。如果您希望每個ID有多個Drop,則應該更復雜一些,以處理df1和df2中的記錄數量之間的錯誤匹配。

df1_ID <- 100:101 
df2_ID <- 101:102 
df1_Drop <- 1:2 
df2_Pick <- 4:5 
unique_ID1 <- unique(df1_ID) 

for (id in seq_along(unique_ID1)) { 
    # how many unique Pick records are there for each ID in df2? 
    nrec <- length(unique(df2_Pick[df2_ID %in% unique_ID1[id]])) 
    if (nrec < 1L) { 
     warning(nrec, " Pick values for ID:", unique_ID1[id], " in df2, ", 
      "Drop was not replaced") 
     # if no records in df2 for and ID from df1, retain Drop 
    } else { 
     pick <- df2_Pick[df2_ID %in% unique_ID1[id]] 
     if (nrec > 1L) { 
      warning(nrec, " Pick values for ID:", unique_ID1[id], 
       " in df2, ", "Drop was replaced with first element") 
      pick <- pick[1] 
     } 
     # set the Drop value of each ID to be the corresponding Pick value 
     df1_Drop[df1_ID == unique_ID1[id]] <- pick 
    } 
} 
# Warning message: 
# 0 Pick values for ID:100 in df2, Drop was not replaced 
df1_Drop 
# [1] 1 4 

改爲合併這些列可能更有意義,而不是像上面那樣手動收集記錄。

+0

數據集是相同的,我已經手動檢查了假設df2具有來自df1的所有數據的手動檢查。所以尋找解決方案。如果您可以讓我知道是否有任何其他方法來創建循環檢查會很好。基於權重的數據分佈。 –

+0

@ArkadeepPaulChoudhury我已經提出了應該更健壯的替代代碼。 – CSJCampbell

+0

@CSJCambell你能否請你重寫我仍然面臨的問題的功能。看到什麼東西丟失。請不要從你想說的話中彌補。 'Cost_match <-match_cost(SingleValueDistribute = F,df1_ID = Net_Rev $'Man.ID', df2_ID = Production_cost $ Man.ID, df2_Pick = Production_cost $ Man.Revenue, df1_Drop = Net_Rev $'Man.Revenue' , df1_Dist_by = Net_Rev $ Revenue,dist = T,weighted = T)' –

相關問題