如何避免'for循環'在R中更快地處理CSV文件？

Data1.csv

BusinessNeedParent,BusinessNeedChild,Identifier 
a1,b1,45 
a2,b2,60 
a3,b3,56

Data2.csv

AdvertiserName,BusinessNeedNumber,State,City 
worker,45,Calif,Los angeles 
workplace,45,Calif,San Diego 
platoon,60,Connec,Bridgeport 
teracota,56,New York,Albany

我想要的輸出：

AdvertiserName,BusinessNeedParent,BusinessNeedChild,State,City 
worker,a1,b1,Calif,Los angeles 
workplace,a1,b1,Calif,San Diego 
platoon,a2,b2,Connec,Bridgeport 
teracota,a3,b3,New York,Albany

所以它必須匹配帶有BusinessNeedNumber的標識符並生成CSV文件上方的數據。到目前爲止，我的代碼是這樣

record <- read.csv("Data1.csv",header=TRUE) 
businessneedinformation <- read.csv("Data2.csv",header=TRUE) 

for(i in record$BusinessNeedNumber){ 
    if(i %in% businessneedinformation$Identifier){ 
    keyword <- "NA" 
    busparent <- businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)] 
    buschild <- businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)] 
    replacementbusparent <- gsub(pattern=",",replacement="",x=busparent) 
    replacementbuschild <- gsub(pattern=",",replacement="",x=buschild) 
    campname <- paste("cat","|","bus","|","en-us","|",(tolower(as.character(replacementbusparent[1]))),"|",(tolower(as.character(replacementbuschild[1]))),sep="") 
    thislist <- data.frame(Keyword = keyword,BusinessNeedParent = businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)],BusinessNeedChild = businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)],Campaign=campname) 
    } 
List <- rbind(List, thislist) 
}

由於我使用for循環，這是非常緩慢的，幾乎10萬條目它花費很長的時間，什麼是更快地實現其R中使用索引的方式。

來源

2014-02-05 user3188390

如果速度再一個問題使用'fread'了'read.csv'。或者至少用'colClasses'參數指定數據類型。 –

增加了另一種使用'Reduce'方法的方法 – RUser

> zz <- "BusinessNeedParent,BusinessNeedChild,Identifier 
a1,b1,45 
a2,b2,60 
a3,b3,56" 
> Data <- read.table(text=zz, header = TRUE,sep=',') 
> Data 
    BusinessNeedParent BusinessNeedChild Identifier 
1     a1    b1   45 
2     a2    b2   60 
3     a3    b3   56 
> zz1 <- "AdvertiserName,BusinessNeedNumber,State,City 
worker,45,Calif,Los angeles 
workplace,45,Calif,San Diego 
platoon,60,Connec,Bridgeport 
teracota,56,New York,Albany" 
> Data1 <- read.table(text=zz1, header = TRUE,sep=',') 
> Data1 
    AdvertiserName BusinessNeedNumber State  City 
1   worker     45 Calif Los angeles 
2  workplace     45 Calif San Diego 
3  platoon     60 Connec Bridgeport 
4  teracota     56 New York  Albany 
> m <- merge(Data,Data1,by.x="Identifier",by.y="BusinessNeedNumber") 
> m[,c(4,2,3,5,6)] 
    AdvertiserName BusinessNeedParent BusinessNeedChild State  City 
1   worker     a1    b1 Calif Los angeles 
2  workplace     a1    b1 Calif San Diego 
3  teracota     a3    b3 New York  Albany 
4  platoon     a2    b2 Connec Bridgeport 
write.csv(m, file = "demoMerge.csv")

，或者您可以使用

m1 <- Reduce(function(old, new) { merge(old, new, by.x='Identifier', by.y='BusinessNeedNumber') }, list_of_files) 
> m1 
    Identifier BusinessNeedParent BusinessNeedChild AdvertiserName State  City 
1   45     a1    b1   worker Calif Los abngles 
2   45     a1    b1  workplace Calif San Diego 
3   56     a3    b3  teracota New York  Albany 
4   60     a2    b2  platoon Connec Bridgeport

來源

2014-02-05 08:53:11 RUser

這個合併對我來說工作正常，但是從Data1返回'm'的長度不同，而它們應該具有相同的長度，我不明白髮生了什麼。另外'合併'到底是做什麼的？ – user3188390

'？merge'應該指導你 – RUser

如何避免'for循環'在R中更快地處理CSV文件？

回答

相關問題