2014-08-27 49 views
0

我在使用plyr編寫邏輯代碼時遇到了一些麻煩。我的問題涉及到兩個不同長度的大dataframes,有如下例子:通過ddply設置數據框的子集,然後在子集上應用adply的函數R

dfSample <- 
structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L, 
33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L, 
16L, 75L), .Label = c("Alumni Services", "Anti-Virus and Malware", 
"Application Integration", "Application Monitoring", "Application Testing", 
"Audio Visual Support", "Audio Visual Support - CLS", "Audio Visual Support - Non-CLS", 
"Backup Services", "Banner", "Bus and Law", "Business Analysis", 
"Careers", "Common Learning Spaces", "Communication and Marketing", 
"Computer Aided Assessment", "Conference Accounts", "Content Management", 
"Database Services", "Datacentre", "Desktop Monitoring", "Desktop Software", 
"Document Management", "Email", "Email Programs", "Encryption", 
"Eng and the Enviro", "Equipment Disposal", "Estates and Facilities", 
"Examination Papers", "Faculty Engagement", "Filestore Support Services", 
"Finance Services", "General Admin Services", "General InfoSec Advice", 
"Generic Accounts", "Grid Accounts (HPC)", "Health Sciences", 
"High Performance Computing (HPC)", "Hosted webspace (LAMP/IIS)", 
"HR and Payroll Services", "HR General", "HR Recruitment", "HR Systems", 
"Hub Rooms", "Humanities", "ICT Facilities", "ID Card Services", 
"Identity Management (User accounts)", "Identity Services", "Information Policy Breaches", 
"Information Risk Analysis", "iSolutions Admin Services", "iSolutions Administration", 
"IT Training and Development", "Large File Transfer", "Lecture Capture", 
"Lecture Capture - CLS", "Lecture Capture - Non-CLS", "Legacy Corporate Systems", 
"Library Services", "Licence Management", "Managed Print Service", 
"Management Servers", "Media Asset Management", "Media Support", 
"Medicine", "Meet and Greet", "Misuse and Security Incidents", 
"Misuse Of Systems", "Mobile Apps", "Mobile Devices", "Natural and Enviro Sci", 
"Network Access Services", "Network Services", "OS Builds", "Other Learning Systems", 
"Personal Filestore", "Personal web pages", "Phys and Applied", 
"Printing (Managed)", "Printing (Not MPS)", "Project Management and Resourcing", 
"Repair", "Reporting Services", "Request for Software", "Research Filestore", 
"Research Governance", "Research Management", "Research Output", 
    "Resource Filestore", "Risk Analysis and Assessment", "Security", 
"Self Service Help", "Server Monitoring", "Service Hosting", 
"ServiceLine", "Soc and Human Sci", "Software Configuration Management", 
"Software Licensing and Management", "Software Services", "SportRec", 
"Staff Accounts", "Staff Desktop Deployment", "Staff Desktop Services", 
"Staff Desktop Services (Not UoS Build)", "Student Accounts", 
"Student Admin Services", "Student Personal Workstations", "SUSSED", 
"Switchboard", "Switchboard Infrastructure", "System Access Request", 
"Telephony", "University Admin Services", "Unmanaged Printing", 
"Videoconferencing", "Videoconferencing - CLS", "Videoconferencing - Non-CLS", 
"Virtual Learning Environment (VLE)", "Visitor Accounts", "Web Statistics", 
"Windows Core Environment"), class = "factor"), Tkt.Category = structure(c(19L, 
17L, 17L, 17L, 17L, 17L, 2L, 19L, 5L, 2L, 9L, 9L, 9L, 4L, 2L, 
5L, 20L, 2L, 19L, 20L), .Label = c("Communication and Collaboration", 
"Corporate Services", "Data Centre", "Data Storage Services", 
"Desktop IT", "Faculty IT", "Help Services", "HR", "Identity Management (User accounts)", 
"Information Security", "Logistics", "Programmes and Projects", 
"Quality and Testing", "Research Services", "Security", "SLO Corporate Services", 
"Software", "Standard", "Teaching Services", "Underpinning Services", 
"Web Services"), class = "factor"), `CreateDateTime` = structure(c(1370087940, 
1370156160, 1370162340, 1370178840, 1370190000, 1370240400, 1370242920, 
1370243040, 1370243040, 1370243280, 1370243280, 1370243520, 1370243580, 
1370243880, 1370243880, 1370244000, 1370244120, 1370244240, 1370244300, 
1370244360), class = c("POSIXct", "POSIXt")), `ClosingDateTime` = structure(c(1374501300, 
1372068300, 1379062020, 1390487100, 1379062080, 1375090560, 1373984760, 
1370856420, 1370440140, 1370508240, 1370338080, 1370243820, 1370243700, 
1370255520, 1370341440, 1370248680, 1370353560, 1370338800, 1370257140, 
1374222600), class = c("POSIXct", "POSIXt"))), .Names = c("Type", 
"Tkt.Category", "CreateDateTime", "ClosingDateTime" 
), row.names = c(NA, 20L), class = "data.frame") 

而且

DF2<- 
structure(list(DateTime = structure(c(1370041200, 1370052000, 
1370062800, 1370073600, 1370084400, 1370095200, 1370106000, 1370116800, 
1370127600, 1370138400, 1370149200, 1370160000, 1370170800, 1370181600, 
1370192400, 1370203200, 1370214000, 1370224800, 1370235600, 1370246400 
), class = c("POSIXct", "POSIXt"))), .Names = "DateTime", row.names = c(NA, 
20L), class = "data.frame") 

我想獲得的基於某些條件,包括dfSample的一個子集的長度從DF2數據如下每個Tkt.Category:

QCalc <- function(m) { 
    adply(DF2, 1, transform, q=as.character(
           nrow(subset(m, CreateDateTime <= DateTime & 
               ClosingDateTime >= DateTime)))) 
} 

ServiceQueue <- ddply(dfSample, .(Tkt.Category), QCalc) 

這似乎並沒有工作,所以我猜一定有與我制定的功能爲的方式問題因爲這塊下方作品碼一部分,當我用我的所有數據(而不是由Tkt.Category分組):

Q <- adply(DF2, 1, transform, q=as.character(
            nrow(subset(dfSample, CreateDateTime<= DateTime & 
                 `ClosingDateTime>= DateTime)))) 

當使用ddply,錯誤消息我得到的是該對象「m」無法找到。有人能指出我解決這個問題的正確方向嗎?

回答

0

如果我們可以重申您的問題,我想我們可以看到一個更簡單的方法來解決它。您想要統計每個類型的票證類別和列表中的每個時間戳,多少個票據在之前開始,結束於之後,並具有該類別。在SQL我們會寫類似:

SELECT Tkt.Category, DateTime, count(*) 
FROM dfSample join DF2 on 
CreateDateTime<= DateTime 
and ClosingDateTime>= DateTime 
GROUP BY Tkt.Category, DateTime 

但這不是SQL,它的R - 和基礎R不允許(雖然也許它應該是,你從一個關係數據庫拉動這些數據?)我們用不平等來合併。所以不是我們可以用合併的小動作,並避免plyr一起:

dfSample$id <- rownames(dfSample) 
DFc <- merge(dfSample,DF2) 
DFlimited <- DFc[DFc$CreateDateTime <= DFc$DateTime & DFc$ClosingDateTime >= DFc$DateTime,] 
DFagg <- aggregate(id ~ Tkt.Category + DateTime, data = DFlimited, length) 

這可能是相當緩慢的,這取決於你的表的大小,因爲它基本上是做一個完全外部聯接,然後過濾。如果您發現這種情況,請查看Data.Table軟件包 - 您可以查看此Stack Overflow問題以獲取更多信息。

+0

我在合併兩個數據框時遇到問題,他們是=不同長度(一個有70,816行,另一個有2921行)。我嘗試過使用all = TRUE,但它一直凍結我的電腦,有沒有其他方法可以做到這一點? – NarT 2014-08-28 14:45:44

+0

我想使用plyr,因爲更進一步,我將不得不在後面按類型和Tkt.Category對計數進行分組。 – NarT 2014-08-28 14:47:57