改變中的R

的數據幀我有一個具有第一列從1到365這樣改變中的R

c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2...

和第二列具有重複一遍又一遍這樣

倍的數據幀

c(0,30,130,200,230,300,330,400,430,500,0,30,130,200,230,300,330,400,430,500...

所以在第一列每隔1個值我在第二列相應的時間，然後當我到了2的時代開始，並每2具有相應的時間，

場合盟友我會碰到3的丟失和300相應的時間與它缺少的

c(3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4... 

c(0,30,130,200,230,330,400,430,500,0,30,130,200,230,300,330,400,430,500...

這裏之一。

我該如何瀏覽整個數據框並添加這些缺失值？我需要一種方法讓R經過並識別任何缺失的值，然後插入一行並在第一列中輸入合適的值（1到365），並在其中輸入適當的時間。因此，對於給定的示例，R將在230和330之間添加一行，然後在第一列中放置3，在第二列中放置300。有部分列缺少幾個連續的值。它不只是一個在那裏

來源

2014-01-10 user2113499

我的例子中沒有看到任何缺失的值？你可以顯示你的data.frame？ – agstudy

@agstudy - 數據從230跳到330，而不經過第二組中的300，這意味着只有9個3而不是所需的10。 – thelatemail

@thelatemail啊好吧我現在明白了。謝謝:) – agstudy

編輯：有提前和代碼整理/明確規定所有10次評論

您需要創建另一個data.frame包含每一個可能的解決方案行，然後用你的data.framemerge它。關鍵的方面是最終合併中的all.x = TRUE，這會強化數據中的空白。我模擬的間隙通過僅採樣第一20個可能的日期/時間的組合的15 your.dat

# create vectors for the days and times 
the.days = 1:365 
the.times = c(0,30,100,130,200,230,330,400,430,500) # the 10 times to repeat 

# create a master data.frame with all the times repeated for each day, taking only the first 20 observations 
dat.all = data.frame(x1=rep(the.days, each=10), x2 = rep(the.times,times = 365))[1:20,] 

# mimic your data.frame with some gaps in it (only 15 of 20 observations are present) 
your.sample = sample(1:20, 15) 
your.dat = data.frame(x1=rep(the.days, each=10), x2 = rep(the.times,times = 365), x3 = rnorm(365*10))[your.sample,] 

# left outer join merge to include ALL of the master set and all of your matching subset, filling blanks with NA 
merge(dat.all, your.dat, all.x = TRUE)

這裏是從合併的輸出，表示與間隙的所有20個可能的記錄清晰可見爲NA：

x1 x2   x3 
1 1 0   NA 
2 1 30 1.23128294 
3 1 100 0.95806838 
4 1 130 2.27075361 
5 1 200 0.45347199 
6 1 230 -1.61945983 
7 1 330   NA 
8 1 400 -0.98702883 
9 1 430   NA 
10 1 500 0.09342522 
11 2 0 0.44340164 
12 2 30 0.61114408 
13 2 100 0.94592127 
14 2 130 0.48916825 
15 2 200 0.48850478 
16 2 230   NA 
17 2 330 0.52789171 
18 2 400 -0.16939587 
19 2 430 0.20961745 
20 2 500   NA

來源

2014-01-10 00:45:20

請注意，我的解決方案只有* 9倍*。在運行之前，向量可能需要100,300個，我錯過了它們！ –

這真的很接近我需要的東西。在X2列中，我需要一個與230和330之間的NA相關的300.我會打包使用plyr庫中的某些東西嗎？我現在正在玩rbind.fill，但還沒有 – user2113499

@ user2113499：我不認爲你需要玩任何其他軟件包。原來的解決方案的邏輯是好的，我只是錯過了我的評論中指出的其中一個時間（300）。嘗試運行它幾次，並檢查變量，看看發生了什麼，我已經添加了您的理解意見。 –

這裏有一些NA處理函數，可以幫助你入門。對於插入任務，您應該使用dput或可重現的示例提供自己的數據。

df <- data.frame(x = sample(c(1, 2, 3, 4), 100, replace = T), 
       y = sample(c(0,30,130,200,230,300,330,400,430,500), 100, replace = T)) 

nas <- sample(NA, 20, replace = T) 
df[1:20, 1] <- nas 
df$y <- ifelse(df$y == 0, NA, df$y) 

# Columns x and y have NA's in diferent places. 

# Logical test for NA 
is.na(df) 

# Keep not NA cases of one colum 
df[!is.na(df$x),] 
df[!is.na(df$y),] 

# Returns complete cases on both rows 
df[complete.cases(df),] 

# Gives the cases that are incomplete. 
df[!complete.cases(df),] 

# Returns the cases without NAs 
na.omit(df)

來源

2014-01-10 00:45:07 marbel

回答

相關問題