我有幾千個*.csv
文件(所有文件都有唯一的名稱),但文件中的標題列相同 - 比如"Timestamp"
,"System_Name"
,"CPU_ID"
等...
我的問題是我怎麼能取代"System_Name"
(這是一個系統名稱像"as12535.org.at"
或任何其他字符組合,並匿名此?我很感激任何提示或點右方向...
下面的CSV文件的結構...R - 通過列表中的data.frames循環 - 修改列(列表元素)的字符
"Timestamp","System_Name","CPU_ID","User_CPU","User_Nice_CPU","System_CPU","Idle_CPU","Busy_CPU","Wait_IO_CPU","User_Sys_Pct"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
"1161025010002000","as06240.org.xyz:LZ","-1","1.83","0.00","0.56","97.28","2.72","0.33","3.26"
我試過用R包anonymizer
,它在矢量級別上工作正常,但是我遇到了這樣的問題,因爲我在R中讀取了數千個csv文件 - 我嘗試的是以下內容 - 創建包含所有csv文件作爲列表中的數據框。
initialize a list
r.path <- setwd("mypath")
ldf <- list()
# creates the list of all the csv files in my directory - but filter for
# files with Unix in the filename for testing.
listcsv <- dir(pattern = ".UnixM.")
for (i in 1:length(listcsv)){
ldf[[i]] <- read.csv(file = listcsv[i])
}
我扭我的大腦死亡,因爲我無法匿名的System_Name
列,甚至可以通過列表(ldf
)和該數據幀的元素替換某些字符(僞匿名)和環路很名單。
我的目錄ldf
(包含單CSV文件DF)是這樣的:
summary(ldf)
Length Class Mode
[1,] 5 data.frame list
[2,] 5 data.frame list
[3,] 5 data.frame list
如何我現在可以在所有的CSV文件,更改閱讀或匿名的整個或甚至是"System_Name"
列的一部分,並且爲我的目錄中的每個CSV執行此操作,在R中進行循環?不需要是超級優雅的 - 很高興當它:-)
使用'lapply'到你想要的功能列表中。我不知道anonymizer如何工作,在假設的情況下,函數就像'anonymizer(column)':'lapply(list,function(x)anonymizer(x $ System_Name))' –