我在使用plyr編寫邏輯代碼時遇到了一些麻煩。我的問題涉及到兩個不同長度的大dataframes,有如下例子:通過ddply設置數據框的子集,然後在子集上應用adply的函數R
dfSample <-
structure(list(Type = structure(c(8L, 100L, 86L, 86L, 86L, 86L,
33L, 8L, 105L, 44L, 36L, 107L, 107L, 78L, 33L, 105L, 99L, 10L,
16L, 75L), .Label = c("Alumni Services", "Anti-Virus and Malware",
"Application Integration", "Application Monitoring", "Application Testing",
"Audio Visual Support", "Audio Visual Support - CLS", "Audio Visual Support - Non-CLS",
"Backup Services", "Banner", "Bus and Law", "Business Analysis",
"Careers", "Common Learning Spaces", "Communication and Marketing",
"Computer Aided Assessment", "Conference Accounts", "Content Management",
"Database Services", "Datacentre", "Desktop Monitoring", "Desktop Software",
"Document Management", "Email", "Email Programs", "Encryption",
"Eng and the Enviro", "Equipment Disposal", "Estates and Facilities",
"Examination Papers", "Faculty Engagement", "Filestore Support Services",
"Finance Services", "General Admin Services", "General InfoSec Advice",
"Generic Accounts", "Grid Accounts (HPC)", "Health Sciences",
"High Performance Computing (HPC)", "Hosted webspace (LAMP/IIS)",
"HR and Payroll Services", "HR General", "HR Recruitment", "HR Systems",
"Hub Rooms", "Humanities", "ICT Facilities", "ID Card Services",
"Identity Management (User accounts)", "Identity Services", "Information Policy Breaches",
"Information Risk Analysis", "iSolutions Admin Services", "iSolutions Administration",
"IT Training and Development", "Large File Transfer", "Lecture Capture",
"Lecture Capture - CLS", "Lecture Capture - Non-CLS", "Legacy Corporate Systems",
"Library Services", "Licence Management", "Managed Print Service",
"Management Servers", "Media Asset Management", "Media Support",
"Medicine", "Meet and Greet", "Misuse and Security Incidents",
"Misuse Of Systems", "Mobile Apps", "Mobile Devices", "Natural and Enviro Sci",
"Network Access Services", "Network Services", "OS Builds", "Other Learning Systems",
"Personal Filestore", "Personal web pages", "Phys and Applied",
"Printing (Managed)", "Printing (Not MPS)", "Project Management and Resourcing",
"Repair", "Reporting Services", "Request for Software", "Research Filestore",
"Research Governance", "Research Management", "Research Output",
"Resource Filestore", "Risk Analysis and Assessment", "Security",
"Self Service Help", "Server Monitoring", "Service Hosting",
"ServiceLine", "Soc and Human Sci", "Software Configuration Management",
"Software Licensing and Management", "Software Services", "SportRec",
"Staff Accounts", "Staff Desktop Deployment", "Staff Desktop Services",
"Staff Desktop Services (Not UoS Build)", "Student Accounts",
"Student Admin Services", "Student Personal Workstations", "SUSSED",
"Switchboard", "Switchboard Infrastructure", "System Access Request",
"Telephony", "University Admin Services", "Unmanaged Printing",
"Videoconferencing", "Videoconferencing - CLS", "Videoconferencing - Non-CLS",
"Virtual Learning Environment (VLE)", "Visitor Accounts", "Web Statistics",
"Windows Core Environment"), class = "factor"), Tkt.Category = structure(c(19L,
17L, 17L, 17L, 17L, 17L, 2L, 19L, 5L, 2L, 9L, 9L, 9L, 4L, 2L,
5L, 20L, 2L, 19L, 20L), .Label = c("Communication and Collaboration",
"Corporate Services", "Data Centre", "Data Storage Services",
"Desktop IT", "Faculty IT", "Help Services", "HR", "Identity Management (User accounts)",
"Information Security", "Logistics", "Programmes and Projects",
"Quality and Testing", "Research Services", "Security", "SLO Corporate Services",
"Software", "Standard", "Teaching Services", "Underpinning Services",
"Web Services"), class = "factor"), `CreateDateTime` = structure(c(1370087940,
1370156160, 1370162340, 1370178840, 1370190000, 1370240400, 1370242920,
1370243040, 1370243040, 1370243280, 1370243280, 1370243520, 1370243580,
1370243880, 1370243880, 1370244000, 1370244120, 1370244240, 1370244300,
1370244360), class = c("POSIXct", "POSIXt")), `ClosingDateTime` = structure(c(1374501300,
1372068300, 1379062020, 1390487100, 1379062080, 1375090560, 1373984760,
1370856420, 1370440140, 1370508240, 1370338080, 1370243820, 1370243700,
1370255520, 1370341440, 1370248680, 1370353560, 1370338800, 1370257140,
1374222600), class = c("POSIXct", "POSIXt"))), .Names = c("Type",
"Tkt.Category", "CreateDateTime", "ClosingDateTime"
), row.names = c(NA, 20L), class = "data.frame")
而且
DF2<-
structure(list(DateTime = structure(c(1370041200, 1370052000,
1370062800, 1370073600, 1370084400, 1370095200, 1370106000, 1370116800,
1370127600, 1370138400, 1370149200, 1370160000, 1370170800, 1370181600,
1370192400, 1370203200, 1370214000, 1370224800, 1370235600, 1370246400
), class = c("POSIXct", "POSIXt"))), .Names = "DateTime", row.names = c(NA,
20L), class = "data.frame")
我想獲得的基於某些條件,包括dfSample的一個子集的長度從DF2數據如下每個Tkt.Category:
QCalc <- function(m) {
adply(DF2, 1, transform, q=as.character(
nrow(subset(m, CreateDateTime <= DateTime &
ClosingDateTime >= DateTime))))
}
ServiceQueue <- ddply(dfSample, .(Tkt.Category), QCalc)
這似乎並沒有工作,所以我猜一定有與我制定的功能爲的方式問題因爲這塊下方作品碼一部分,當我用我的所有數據(而不是由Tkt.Category
分組):
Q <- adply(DF2, 1, transform, q=as.character(
nrow(subset(dfSample, CreateDateTime<= DateTime &
`ClosingDateTime>= DateTime))))
當使用ddply
,錯誤消息我得到的是該對象「m
」無法找到。有人能指出我解決這個問題的正確方向嗎?
我在合併兩個數據框時遇到問題,他們是=不同長度(一個有70,816行,另一個有2921行)。我嘗試過使用all = TRUE,但它一直凍結我的電腦,有沒有其他方法可以做到這一點? – NarT 2014-08-28 14:45:44
我想使用plyr,因爲更進一步,我將不得不在後面按類型和Tkt.Category對計數進行分組。 – NarT 2014-08-28 14:47:57