我的CSV文件類似於如下 -如何避免'for循環'在R中更快地處理CSV文件?
Data1.csv
BusinessNeedParent,BusinessNeedChild,Identifier
a1,b1,45
a2,b2,60
a3,b3,56
Data2.csv
AdvertiserName,BusinessNeedNumber,State,City
worker,45,Calif,Los angeles
workplace,45,Calif,San Diego
platoon,60,Connec,Bridgeport
teracota,56,New York,Albany
我想要的輸出:
AdvertiserName,BusinessNeedParent,BusinessNeedChild,State,City
worker,a1,b1,Calif,Los angeles
workplace,a1,b1,Calif,San Diego
platoon,a2,b2,Connec,Bridgeport
teracota,a3,b3,New York,Albany
所以它必須匹配帶有BusinessNeedNumber的標識符並生成CSV文件上方的數據。 到目前爲止,我的代碼是這樣
record <- read.csv("Data1.csv",header=TRUE)
businessneedinformation <- read.csv("Data2.csv",header=TRUE)
for(i in record$BusinessNeedNumber){
if(i %in% businessneedinformation$Identifier){
keyword <- "NA"
busparent <- businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)]
buschild <- businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)]
replacementbusparent <- gsub(pattern=",",replacement="",x=busparent)
replacementbuschild <- gsub(pattern=",",replacement="",x=buschild)
campname <- paste("cat","|","bus","|","en-us","|",(tolower(as.character(replacementbusparent[1]))),"|",(tolower(as.character(replacementbuschild[1]))),sep="")
thislist <- data.frame(Keyword = keyword,BusinessNeedParent = businessneedinformation$BusinessNeedParent[which(businessneedinformation$Identifier==i)],BusinessNeedChild = businessneedinformation$BusinessNeedChild[which(businessneedinformation$Identifier==i)],Campaign=campname)
}
List <- rbind(List, thislist)
}
由於我使用for循環,這是非常緩慢的,幾乎10萬條目它花費很長的時間,什麼是更快地實現其R中使用索引的方式。
如果速度再一個問題使用'fread'了'read.csv'。或者至少用'colClasses'參數指定數據類型。 –
增加了另一種使用'Reduce'方法的方法 – RUser