我有一些perfmon(Windows性能日誌數據)數據我希望解析。分割一個變量名稱並將其拆分爲R中的單獨列中的數據
通常一組列名如下所示:
> colnames(p)
[1] "Time"
[2] "\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Queue Length"
[3] "\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Read Queue Length"
[4] "\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Write Queue Length"
[5] "\\\\testdb1\\Processor(_Total)\\% Processor Time"
[6] "\\\\testdb1\\System\\Processes"
[7] "\\\\testdb1\\System\\Processor Queue Length"
,我輸入的這個數據爲R的方式,是:
p <- read.csv("r-perfmon.csv",stringsAsFactors = FALSE, check.names = FALSE)
下面是一些示例數據
> head(p)
Time \\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Queue Length
1 04/15/2013 00:00:19.279 0.040037563
2 04/15/2013 00:00:34.279 0.009740260
3 04/15/2013 00:00:49.275 0.011009828
4 04/15/2013 00:01:04.284 0.006016244
5 04/15/2013 00:01:19.279 0.015125328
6 04/15/2013 00:01:34.275 0.002814141
\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Read Queue Length
1 0.001421333
2 0.000000000
3 0.000206726
4 0.000000000
5 0.001894000
6 0.000000000
\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Write Queue Length
1 0.038616230
2 0.009740260
3 0.010803102
4 0.006016244
5 0.013231327
6 0.002814141
\\\\testdb1\\Processor(_Total)\\% Processor Time \\\\testdb1\\System\\Processes
1 29.569339 86
2 10.856994 86
3 7.733924 81
4 1.910202 81
5 6.164864 81
6 1.351883 81
\\\\testdb1\\System\\Processor Queue Length
1 0
2 0
3 0
4 0
5 0
6 0
我希望能夠解析列名,然後融化數據。
所以,如果我們把一列數據作爲例子
> example <- p[2]
> head(example)
\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Queue Length
1 0.040037563
2 0.009740260
3 0.011009828
4 0.006016244
5 0.015125328
6 0.002814141
我希望它看起來像這樣
Time, MachineName, Object, Counter, InstanceName, Value
04/15/2013 00:00:19.279, testdb1, PhysicalDisk, Avg. Disk Queue Length, 0 C:, 0.040037563
04/15/2013 00:00:34.279, testdb1, PhysicalDisk, Avg. Disk Queue Length, 0 C:, 0.009740260
04/15/2013 00:00:49.275, testdb1, PhysicalDisk, Avg. Disk Queue Length, 0 C:, 0.011009828
編輯:根據要求我的數據
頭的dputstructure(list(`(PDH-CSV 4.0) (GMT Daylight Time)(-60)` = c("04/15/2013 00:00:19.279",
"04/15/2013 00:00:34.279", "04/15/2013 00:00:49.275", "04/15/2013 00:01:04.284",
"04/15/2013 00:01:19.279", "04/15/2013 00:01:34.275"), `\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Queue Length` = c(0.040037563,
0.00974026, 0.011009828, 0.006016244, 0.015125328, 0.002814141
), `\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Read Queue Length` = c(0.001421333,
0, 0.000206726, 0, 0.001894, 0), `\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Write Queue Length` = c(0.03861623,
0.00974026, 0.010803102, 0.006016244, 0.013231327, 0.002814141
), `\\\\testdb1\\Processor(_Total)\\% Processor Time` = c(29.56933862,
10.85699395, 7.733924001, 1.910202013, 6.164864178, 1.351882837
), `\\\\testdb1\\System\\Processes` = c(86L, 86L, 81L, 81L, 81L,
81L), `\\\\testdb1\\System\\Processor Queue Length` = c(0L, 0L, 0L,
0L, 0L, 0L)), .Names = c("(PDH-CSV 4.0) (GMT Daylight Time)(-60)",
"\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Queue Length", "\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Read Queue Length",
"\\\\testdb1\\PhysicalDisk(0 C:)\\Avg. Disk Write Queue Length",
"\\\\testdb1\\Processor(_Total)\\% Processor Time", "\\\\testdb1\\System\\Processes",
"\\\\testdb1\\System\\Processor Queue Length"), row.names = c(NA,
6L), class = "data.frame")
首先在r中使用'reshape'將數據重塑爲長格式,然後在最後一列名稱中使用'strsplit'。如果您希望其他人重現您的數據,您還需要「輸入」您的數據。 – user227710
我用長格式'p < - melt(p,id = c(「time」))''但我正在努力解決這個問題 – Gauss
在寬格式中,您可以一次更改每一列......但是im不確定最終數據集的外觀。但是對於你的例子..'s < - strsplit(colnames(example),「\\\\ | \\)| \\(」)[[1]]; data.frame(t(s [nzchar(s)]),示例[[1]]) – user20650