日期格式化空氣質量數據的功能編程

我有兩個空氣質量儀器（dusttrak和ptrak），它們記錄數據並將它們存儲爲.csv文件。我的目標是通過函數式編程實現數據清理過程的自動化。每臺儀器記錄不同的時間間隔（30秒比1秒），每臺儀器都有一個唯一的標題。

我已經有一個讀取ptrak數據的函數。它會刪除令人討厭的標題並將日期和時間列轉換爲一個as.POSIX日期時間。結果是隻有兩列的日期時間和粒子數濃度（pnc）的新的寬格式數據幀。

這裏的ptrak功能：

## assume there is only one file per directory for now 
read.ptrak<-function(fpath){ 
    x<-read.csv(fpath,skip=30,header=FALSE,stringsAsFactors=FALSE) #removing the first 30 rows of garbage 
    colnames(x) <- c("date","time","pnc") #creating my own header 
    ##merge date and time column together 
    x$datetime<-strptime(paste(x$date,x$time), "%m/%d/%Y %H:%M:%S", tz="UTC") 
    ## convert the first column to a posix timestamp 
    x$datetime<-as.POSIXct(x$datetime,format=dt_format, tz="UTC") 
    x<-x[,-c(1:2)] ## remove redundant variables date, and time 
    x<-x[,c(2:1)] ## reorder columns so datetime is first 
    return(x) 
} 

#okay now we can apply our function to our ptrak csv file: 
ptrak_data <- read.ptrak(**INSERT FILE PATH HERE**) 
head(ptrak_data) 
#everything looks great!

當我遇到的麻煩是與DUSTTRAK數據。我只提供一個位於標題內的開始時間，而不是每個觀察都有一個日期和時間列。實際的數據幀僅提供從此開始時間起30秒間隔內的總耗時。我想創建一個新的數據幀，它具有POSIX時間戳和五個粒子質量濃度（見下文），我可以稍後使用datetime與ptrak進行合併。任何人都可以提供一個函數，使用開始時間和經過時間來創建一個新的日期時間向量，然後刪除標題，這樣我就可以使用兩列的寬格式數據框嗎？

這是我在清理DUSTTRAK數據第一次嘗試：

read.dtrak<-function(fpath){ 
    x<-read.csv(fpath,skip=36,header=FALSE,stringsAsFactors=FALSE) 
    colnames(x)<-c("elapsedtime","pm1","pm2.5","pm4","pm10","total","alarms","errors") 
    ## need to read in the same file again and keep the header to extract the start time and start date: 
    y<-read.csv(fpath,skip=6,header=FALSE,stringsAsFactors=FALSE) 
    colnames(y)<-c("variable","value") ## somewhat arbitrary colnames for temporary df 
    starttime <-y[1,2] 
    startdate <-y[2,2] 
    startdatetime <- strptime(paste(startdate,starttime), "%m/%d/%Y %H:%M:%S", tz="UTC") 
    #convert to posix timestamp: 
    startdatetime <-as.POSIXct(startdatetime, format=dt_format, tz="UTC") 
    ## create a new variable called datetime in dataframe 'x' 
    x$datetime <- startdatetime + x$elapsedtime ## this is giving me the following error: "Error in unclass(e1) + unclass(e2) : non-numeric argument to binary operator 
    return(x) 
}

的最終目標是產生一個清潔的數據幀是類似於ptrak數據，除了而不是報告一個粒子數濃度（PNC）需要PM1，PM2.5，PM4，PM10和TOTAL（請參閱dusttrak_data.csv）。

爲了在帖子內不包括樣本數據而提前道歉。我無法弄清楚如何創建包含那些討厭的頭文件的示例數據！

尋找這個問題的答案基本上可以節省我+100小時的手動數據清理工作，所以我非常感謝您的洞察！

這裏的數據： Ptrak， Dusttrak 編輯：轉換Dave2e的解決方案成爲功能那些有興趣誰的。

read.dtrak<-function(fpath){ 
    sdate<-read.csv(fpath, header=FALSE, nrow=1, skip =7) 
    stime <-read.csv(fpath, header = FALSE, nrow=1, skip=8) 
    startDate<-as.POSIXct(paste(sdate$V2, stime$V2), "%m/%d/%Y %H:%M:%S", tz="UTC") 
    x<-read.csv(fpath, skip=36, stringsAsFactors = FALSE) 
    names(x)<-c("elapsedtime","pm1","pm2.5","pm4","pm10","total","alarms","errors") 
    x$elapsedtime<-x$elapsedtime+startDate 
    x<-x[,-c(7,8)] #remove the alarms and errors variables 
    names(x$elapsedtime)<-"datetime" #rename timestamp to datetime 
    return(x) 
} 

read.dtrak("**INSERT FILE PATH HERE**")

來源

2017-03-03 spacedSparking

這是非常複雜的，現在要問很多。我建議你將這個問題編輯成一個*單個問題*，重點也許放在一個POSIX轉換上。我認爲如果你的問題更側重點，你會更成功地獲得幫助（甚至提到Ptrak的數據，更少包括它，相關？） – Gregor

我很感謝你的意見。我想包括ptrak函數作爲說明我的問題的一般方法的一種方式，希望爲幫助其他人在幫助處理dusttrak數據時提供一個框架。值得慶幸的是，Dave2e和我有着相似的波長，並且能夠提供一個聰明的解決方案。我在原始文章中添加了一個完整的解決方案，將他的解決方案轉換爲一個功能！ – spacedSparking

這是一個非常簡單的問題，假設每個文件在標題中都有恆定數量的行。 POSIXct對象是自開始以來的秒數。由於您的數據是以秒爲單位的，因此只需將經過的時間添加到開始時間即可。

我讀了兩行開始日期和時間。將這些值粘貼在一起並轉換爲日期時間對象，然後讀入其餘數據。增加了開始時間的流逝時間，你很好走。

#pratice<-readLines("dusttrak_data.csv") 
#get start time and date then convert to POSIXct object 
stime<-read.csv("dusttrak_data.csv", header = FALSE, nrow=1, skip=6) 
sdate<-read.csv("dusttrak_data.csv", header = FALSE, nrow=1, skip=7) 

#read data, and add elasped time to start time 
startDate<-as.POSIXct(paste(sdate$V2, stime$V2), "%m/%d/%Y %I:%M:%S %p", tz="EST") 
df<-sdate<-read.csv("dusttrak_data.csv", skip=36) 
names(df)<-c("elapsedtime", "PM1", "PM2.5", "PM4", "PM10", "TOTAL", "Alarms", "Errors") 
df$elapsedtime<-df$elapsedtime+startDate 
#removed columns 7 and 8 
df<-df[,-c(7:8)]

您將需要調整as.POSIXct函數中的時區以匹配傳感器時間。

來源

2017-03-03 23:50:37 Dave2e

感謝您的回覆。我精確地跟蹤了你的代碼，雖然它不會產生任何錯誤，但我的「elapsedtime」向量充滿了NAs。我正在玩弄時區參數，以及posix字符串，但到目前爲止我還沒有運氣。你最初是否也遇到過這個問題？ – spacedSparking

我從我的工作空間複製並粘貼了這個文件，使用下載的文件。我會查看startDate是否正確。如果時區不正確，它可能會生成警告並導致NA。 – Dave2e

隨着一些小的改動，你的代碼保存了一天！非常感激！我需要改變以下內容：''df <-read.csv（「** INSERT FILE PATH **」，skip = 36，stringsAsFactors = FALSE）''我也有'skip = 7'sdate，'skip = 8'爲stime。 – spacedSparking

日期格式化空氣質量數據的功能編程

回答

相關問題