是否有一種習慣於R規範化數據框的方式？

問題如下：我們有一個csv文件，其中包含一些數據異常形式。 R是巨大的，我錯過了一些簡短的解決方案。是否有一種習慣於R規範化數據框的方式？

鑑於我們讀它，並得到下面的表格的數據幀文件：

# id, file, topic, proportion, [topic, proportion]* 
0,file1.txt,0,0.01 
1,file2.txt,0,0.01,1,0.03

是否有將其轉換成該數據幀任何短期方式：

id  file topic proportion 
0 file1.txt  0  0.01 
1 file2.txt  0  0.01 
1 file2.txt  1  0.03

，我們有固定的列數？話題比例對的數量沒有定義，並且可能非常大。謝謝！

來源

2015-01-11 Andrei Beliankou

你的問題不清楚。你是否將數據讀入R？另外，我相信你的意思是一致的列數，而不是行。 –

是的，我正在讀取一個文件，獲得一個具有不同數量列的數據框，並且我想規範化這些數據以獲得固定數量的列來分割每條記錄。 –

有一種方法可以繼續。我想data包含您的文件保存爲.csv文件的路徑：

library(plyr) 

df  = read.csv(data) 
names  = c("id","file","topic","proportion") 
extractDF = function(u) setNames(df[,c(1,2,u,u+1)], names) 

newDF = ldply(seq(3,length(df)-1,by=2), extractDF) 

newDF[complete.cases(newDF),] 

# id  file topic proportion 
#1 0 file1.txt  0  0.01 
#2 1 file2.txt  0  0.01 
#4 1 file2.txt  1  0.03

數據有以下幾種，保存csv格式：

# id, file, topic, proportion, [topic, proportion]* 
0,file1.txt,0,0.01 
1,file2.txt,0,0.01,1,0.03

來源

2015-01-11 21:08:44

感謝您使用此代碼。是否有任何通用的解決方案，我們沒有設置字面行索引？假設你可以有不同行數的不同行數。 –

我從來沒有設置行索引，但通過其索引提及列！如果您在示例中提到您擁有4或6個元素的行，則此解決方案將工作。 –

不過有沒有辦法處理任意長度的行？ –

你可以試着從我的「splitstackshape」包merged.stack。

假設這是你的起始數據....

mydf <- read.table(
    text = "id, file, topic, proportion, topic, proportion 
0,file1.txt,0,0.01 
1,file2.txt,0,0.01,1,0.03", 
    header = TRUE, sep = ",", fill = TRUE) 
mydf 
# id  file topic proportion topic.1 proportion.1 
# 1 0 file1.txt  0  0.01  NA   NA 
# 2 1 file2.txt  0  0.01  1   0.03

你就只需要做....

library(splitstackshape) 
merged.stack(mydf, var.stubs = c("topic", "proportion"), 
      sep = "var.stubs")[, .time_1 := NULL][] 
# id  file topic proportion 
# 1: 0 file1.txt  0  0.01 
# 2: 0 file1.txt NA   NA 
# 3: 1 file2.txt  0  0.01 
# 4: 1 file2.txt  1  0.03

總結，如果你不想在na.omit整個事情其中包含NA值的行。

na.omit(
    merged.stack(mydf, var.stubs = c("topic", "proportion"), 
       sep = "var.stubs")[, .time_1 := NULL]) 
# id  file topic proportion 
# 1: 0 file1.txt  0  0.01 
# 2: 1 file2.txt  0  0.01 
# 3: 1 file2.txt  1  0.03

來源

2015-01-12 04:16:25 A5C1D2H2I1M1N2O1R2T1

是否有一種習慣於R規範化數據框的方式？

回答

相關問題