如何有條件地按行比較數據行並將不同的結果輸出到其他列？

如何有條件地按行比較數據行並將不同的結果輸出到其他列？如何有條件地按行比較數據行並將不同的結果輸出到其他列？

請參考下面dataset，

第1行的den爲1，開始每行的權重以行1的重量進行比較，每行的體積行1

的量比較

檢查某個行的Weight是否高於行1的權重，行1的higher列將變爲1，否則檢查某個行的Volume是否低於行1的Volume乘以1.0，第1行的列將變爲1.

在滿足這些條件之一之前，請繼續比較下一行和下一行....如果滿足第2行中的任一條件，請轉到第3行，如果滿足第3行中的任一條件，繼續前進到第4行，逐行.....等等。

當滿足其中一個條件時（行1 == 1的higher或lower列之一），請轉到下一行，其中den==1即第3行。然後是第6行。

howhigh列將記錄行1的higher == 1行Weight與行1的Weight之間的差異。 between列將記錄符合條件的行差（例如：在Expected Outcome中，行1的between爲5，因爲條件符合第6行，因此第3行的,between爲3，因爲條件符合行6，以便6 - 3 = 3）

然後dataset將成爲類似的Expected Outcome

從Expected Outcome例如以行14中，因爲higher==1從行18 Weight較高。該howhigh是0.0649因爲排14和18的Weight不同的是0.0649，則between是4因爲18-14=4

我如何實現這一目標的量化方法來提高計算速度？在此先感謝。

數據集

Weight Volume den higher lower between howhigh 
1 5.1626 5.1594 1  0  0  0  0 
2 5.1615 5.1559 0  0  0  0  0 
3 5.1600 5.1574 1  0  0  0  0 
4 5.1593 5.1582 0  0  0  0  0 
5 5.1592 5.1572 0  0  0  0  0 
6 5.1635 5.1580 1  0  0  0  0 
7 5.1608 5.1580 0  0  0  0  0 
8 5.1602 4.0565 0  0  0  0  0 
9 5.1582 5.1554 0  0  0  0  0 
10 5.1563 5.1547 0  0  0  0  0 
11 5.1578 5.1550 1  0  0  0  0 
12 5.1589 5.1560 0  0  0  0  0 
13 5.1578 3.1553 0  0  0  0  0 
14 5.1591 5.1554 1  0  0  0  0 
15 5.1585 5.1563 0  0  0  0  0 
16 5.1572 5.1557 0  0  0  0  0 
17 5.1565 5.1520 0  0  0  0  0 
18 5.2240 5.1518 0  0  0  0  0 
19 5.1540 5.1505 1  0  0  0  0 
20 5.1539 5.1488 0  0  0  0  0 
21 5.1520 5.1408 0  0  0  0  0 
22 5.1450 5.1420 0  0  0  0  0 
23 5.1455 5.1420 0  0  0  0  0 
24 5.1461 5.1435 0  0  0  0  0 
25 5.1470 5.1437 0  0  0  0  0 
26 5.1449 5.1378 0  0  0  0  0 
27 5.1423 5.1385 0  0  0  0  0 
28 6.1429 5.1401 0  0  0  0  0 
29 5.1425 5.1399 0  0  0  0  0 
30 5.1433 5.1403 1  0  0  0  0

預期結果

Weight Volume den higher lower between howhigh 
1 5.1626 5.1594 1  1  0  5 0.0009 
2 5.1615 5.1559 0  0  0  0  0 
3 5.1600 5.1574 1  1  0  3 0.0035  
4 5.1593 5.1582 0  0  0  0  0 
5 5.1592 5.1572 0  0  0  0  0 
6 5.1635 5.1580 1  0  1  2  0 
7 5.1608 5.1580 0  0  0  0  0 
8 5.1602 4.0565 0  0  0  0  0 
9 5.1582 5.1554 0  0  0  0  0 
10 5.1563 5.1547 0  0  0  0  0 
11 5.1578 5.1550 1  0  1  2  0 
12 5.1589 5.1560 0  0  0  0  0 
13 5.1578 3.1553 0  0  0  0  0 
14 5.1591 5.1554 1  1  0  4 0.0649 
15 5.1585 5.1563 0  0  0  0  0 
16 5.1572 5.1557 0  0  0  0  0 
17 5.1565 5.1520 0  0  0  0  0 
18 5.2240 5.1518 0  0  0  0  0 
19 5.1540 5.1505 1  1  0  9 0.9889 
20 5.1539 5.1488 0  0  0  0  0 
21 5.1520 5.1408 0  0  0  0  0 
22 5.1450 5.1420 0  0  0  0  0 
23 5.1455 5.1420 0  0  0  0  0 
24 5.1461 5.1435 0  0  0  0  0 
25 5.1470 5.1437 0  0  0  0  0 
26 5.1449 5.1378 0  0  0  0  0 
27 5.1423 5.1385 0  0  0  0  0 
28 6.1429 5.1401 0  0  0  0  0 
29 5.1425 5.1399 0  0  0  0  0 
30 5.1433 5.1403 1  0  0  0  0

來源

2017-06-22 theman

我參加了一個刺在此。讓我知道速度如何，因爲它不是100％的矢量化解決方案。我花了一段時間才明白，你只想看看書房以下的行，如果體積較低，你的意思並不是低於1.0，而是等於或小於1.0。

# Your data 
dat <- structure(list(Weight = c(5.1626, 5.1615, 5.16, 5.1593, 5.1592, 5.1635, 5.1608, 5.1602, 5.1582, 5.1563, 5.1578, 5.1589, 5.1578, 5.1591, 5.1585, 5.1572, 5.1565, 5.224, 5.154, 5.1539, 5.152, 5.145, 5.1455, 5.1461, 5.147, 5.1449, 5.1423, 6.1429, 5.1425, 5.1433), Volume = c(5.1594, 5.1559, 5.1574, 5.1582, 5.1572, 5.158, 5.158, 4.0565, 5.1554, 5.1547, 5.155, 5.156, 3.1553, 5.1554, 5.1563, 5.1557, 5.152, 5.1518, 5.1505, 5.1488, 5.1408, 5.142, 5.142, 5.1435, 5.1437, 5.1378, 5.1385, 5.1401, 5.1399, 5.1403), den = c(1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), higher = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), lower = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), between = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), howhigh = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("Weight", "Volume", "den", "higher", "lower", "between", "howhigh"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"))

我添加ROWNUMBER到data.frame更多的四通八達的交通網絡中的應用，那麼我只需要與den == 1行創建遍歷一個新的變量。

dat$rownum <- 1:nrow(dat) 
newd <- dat[dat$den == 1,] 
# Weight Volume den higher lower between howhigh rownum 
#1 5.1626 5.1594 1  1  0  0  0  1 
#3 5.1600 5.1574 1  1  0  0  0  3 
#6 5.1635 5.1580 1  1  0  0  0  6 
#11 5.1578 5.1550 1  1  0  0  0  11 
#14 5.1591 5.1554 1  1  0  0  0  14 
#19 5.1540 5.1505 1  1  0  0  0  19 
#30 5.1433 5.1403 1  1  0  0  0  30

的功能：

out <- t(apply(newd, 1, function(d){ 
    rownum <- d["rownum"] 
    a <- which(dat$Weight > d["Weight"]) 
    a <- a[a > rownum][1] 
    b <- which((dat$Volume - d["Volume"]) <= -1.0) 
    b <- b[b > rownum][1] 
    pick <- ifelse(!is.na(b), ifelse(a < b, "a", "b"), "a") 
    if(pick == "a"){ 
    d["higher"] <- 1 
    d["howhigh"] <- dat$Weight[a] - d["Weight"] 
    d["between"] <- a - rownum 
    } else { 
    d["lower"] <- 1 
    d["between"] <- b - rownum 
    } 
    d[is.na(d)] <- 0 
    d 
})) 
out 
# Weight Volume den higher lower between howhigh rownum 
#1 5.1626 5.1594 1  1  0  5 0.0009  1 
#3 5.1600 5.1574 1  1  0  3 0.0035  3 
#6 5.1635 5.1580 1  0  1  2 0.0000  6 
#11 5.1578 5.1550 1  1  0  1 0.0011  11 
#14 5.1591 5.1554 1  1  0  4 0.0649  14 
#19 5.1540 5.1505 1  1  0  9 0.9889  19 
#30 5.1433 5.1403 1  1  0  0 0.0000  30 

dat[dat$den == 1,] <- out # replace old rows with new ones 
dat[,-8] # remove the rownum column

來源

2017-06-22 16:03:48

感謝您的嘗試。速度方面，它比我的循環更快。儘管如此。如果'higher'已經是'1'，則跳過計算'lower'。如果'lower'爲1，則跳過計算'higher'。在'higher'和'lower'列中只能有一個'1'。在你的結果第6行中，'lower'列應該是0，因爲'lower'是1，條件在'higher'條件之前滿足。 – theman

這很奇怪......當我運行我的代碼時，它可以正常工作，並且＃6只填充了1個（因爲它應該），但我似乎已經粘貼了讓它們都以某種方式填充的結果......您是否嘗試了運行代碼或你只是在線看我的（不正確）輸出？ –

我的代碼被寫入的方式應該不可能有更高和更低的等於1.我必須在某處輸入錯別字... –

如何有條件地按行比較數據行並將不同的結果輸出到其他列？

回答

相關問題