按測量值連接數據幀並顯示錯誤範圍

我正在尋找一種方法來連接（或者合併）R中包含具有指定錯誤範圍的測量值的R中的兩個或多個數據幀。這意味着「by」列中的值將是nnn.nnnn +/- 0.000n。容錯限於3 e-6倍的值。按測量值連接數據幀並顯示錯誤範圍

這是迄今爲止我的最佳嘗試。

newDF < - left_join（P0511_480k，P0511_SF00V，通過= C（P0511_480k $ MZ ==（P0511_SF00V $ MZ - 0.000003（P0511_480k $ MZ））：（P0511_SF00V $ MZ + 0.000003（P0511_480k $ MZ））））

在這個表達式中，我有兩個數據幀（P0511_480k和P0511_SF00V），我想通過名爲「mz」的列合併它們。值的可接受範圍是正數或負數「m.z」乘以0.000003。例如，P0511_480k_subset $ m.z = 187.06162應該與P0511_SF00V_subset $ m.z = 187.06155相匹配。

> dput(head(P0511_480k_subset, 10)) 
structure(list(m.z = c(187.06162, 203.05652, 215.05668, 217.07224, 
279.05499), Intensity = c(319420.8, 288068.9, 229953, 210107.8, 
180054), Relative = c(100, 90.18, 71.99, 65.78, 56.37), Resolution = c(394956.59, 
415308.31, 387924.91, 437318.31, 410670.91), Baseline = c(2.1, 
1.43, 1.69, 1.73, 3.04), Noise = c(28.03, 27.17, 27.52, 27.58, 
29.37)), .Names = c("m.z", "Intensity", "Relative", "Resolution", 
"Baseline", "Noise"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
-5L))

和

> dput(head(P0511_SF00V_subset, 10)) 
structure(list(m.z = c(187.06155, 203.05641, 215.05654, 217.0721 
), Intensity = c(1021342.8, 801347.1, 662928.1, 523234.2), Relative = c(100, 
78.46, 64.91, 51.23), Resolution = c(314271.88, 298427.41, 289803.97, 
288163.63), Baseline = c(6.89, 10.47, 9.13, 8.89), Noise = c(40.94, 
45.98, 44.3, 44.01)), .Names = c("m.z", "Intensity", "Relative", 
"Resolution", "Baseline", "Noise"), class = c("tbl_df", "data.frame" 
), row.names = c(NA, -4L))

我感謝您的建議！我已經儘可能廣泛地搜索了幫助文檔，但我一直無法找到接近我所需的示例。

非常感謝！

來源

2016-11-22 Lynn Mazzoleni

請使用'dput（）'或'dput（head（df，20））'提供您的數據（或其子集）。另外，當你進行乘法運算時（即使數字在括號之前），你需要指定'*' – etienne

查看[* fuzzyjoin * package]（https://github.com/dgrtwo/fuzzyjoin），它是dplyr的加入操作的變體。 – aosmith

我認爲你需要像'data.table :: foverlaps（）'，提供數據和預期的輸出。 – zx8754

如果你不需要不匹配的行，那麼這可以工作。假設這兩個數據集是df1和df2。通過df1中的m.z列查看，如果它在df2的m.z列中的任何值的0.000003容差範圍內，則將df1中的該值替換爲df2中的相應匹配值。然後合併兩個數據幀。

df1$m.z <- sapply(df1$m.z, function(x) 
       { 
        # First check if the element lies within tolerance limits of any element in df2 
        ifelse(min(abs(df2$m.z - x), na.rm=TRUE) < 0.000003 * x, 
        # If yes, replace that element in df1 with the matching element in df2 
        df2[which.min(abs(df2$m.z - x)),"m.z"], 0) 
       }) 
df3 <- merge(df1, df2)

來源

2016-11-23 01:16:52

是的，但是sapply將mz值轉換爲列表。 –

因此，我添加了「df1 $ m.z < - as.numeric（df1 $ m.z）」。它似乎可行，但考慮到我有5個數據幀要合併，這個過程很笨拙。 –

我明白了。我同意它會有點混亂，但我會使用for循環或lapply 5個數據幀。 –

按測量值連接數據幀並顯示錯誤範圍

回答

相關問題