正常化數據幀的水平

我還是新來的R，所以如果這個問題似乎對你很明顯提前抱歉。我目前正在研究藥物篩選協議，並在Excel中創建了.csv表格，並將分析結果輸出。我輸入它作爲數據幀作爲raw.data成R具有以下結構：

| Sample  | Group | Parameter Drug 1 | Parameter Drug 2 | Time Parameter Drug 1 (ms) | 
|---------------|-------|------------------|------------------|----------------------------| 
| Heart_Sample1 | Heart | 2.4    | 9.0    | 1.5      | 
| Heart_Sample1 | Heart | 2.29    | 22.2    | 3.4      | 
| Heart_Sample1 | Heart | 3.4    | 3.5    | 4.5      | 
| Heart_Sample1 | Heart | 5.2    | 8.4    | 6.5      | 
| Heart_Sample1 | Heart | 2.3    | 34.1    | 7.8      | 
| ...   | Organ | value   | value   | time      | 
| Heart_Sample2 | Heart | 10.4    | 10.2    | 1.5      | 
| Heart_Sample2 | Heart | 8.4    | 2.45    | 3.6      | 
| ...   | Organ | value   | value   | time      | 
| Liver_Sample1 | Liver | 13.4    | 44.5    | 2.8      | 
| ...   | Organ | 2.3    | value   | time      |

參數表示某個參數我實驗測量（例如神經元尖峯）的值。參數時間表示峯值出現的記錄時間。

我轉化raw.data成mod.data與gather用下面的公式：

mod.data <- gather(raw.data, `Parameter Drug 1`, `Parameter Drug 2`, `Parameter Drug 3`, key = "Drug", value = "value") 




| Sample  | Group | Time Parameter Drug 1 (ms) | Drug   | value | 
|---------------|-------|----------------------------|-----------------|-------| 
| Heart_Sample1 | Heart |       | Baseline  |  | 
| Heart_Sample1 | Heart |       | Baseline  |  | 
| Heart_Sample1 | Heart |       | Concentration 1 |  | 
| Heart_Sample1 | Heart |       | Concentration 1 |  | 
| Heart_Sample1 | Heart |       | Concentration 2 |  |

然後我生成的地塊，通過採樣和分離，以便具有所發生的事情，以所述參數的明確的概述，過時間，在所有的樣本。結果是一個龐大的情節陣列，約200個地塊。

由於不同的器官有不同的價值觀，並在相同的器官，我可以找到非常不同的值，鱗片有每個樣品內進行匹配，清楚地瞭解什麼是樣品中去。

我然後試圖用下面的函數正常化：

normalize <- function(x){ 
    (x - min(x))/(max(x)-min(x)) 
    }

其中x是我感興趣的參數。不幸的是，它需要爲min和max各自min和整個參數max，無論樣品和集團。我也嘗試子集，但這意味着要爲每個樣本創建一個子集，然後將它們合併到一個圖中。我也嘗試用group_by(Sample, Group)，如RStudio的cheatsheet描述，但我無法正規化功能應用到生成的數據幀。

TL;博士我的問題是：我怎麼能正常化，從0到1，每個樣品中，我的價值觀？

預先感謝您的答案。

問候

來源

2017-03-28 Mollan

下面是使用dplyr和你normalize功能的另一種方法。我沒有把它應用到我創建的玩具數據上。

library(dplyr) 

set.seed(123) 

df <- data.frame(Sample = sample(c("Sample1", "Sample2"), 20, replace = T), 
       Group = sample(c("Heart", "Liver"), 20, replace = T), 
       Time = sample(100:500, 20), 
       Value = sample(1000:5000, 20)) 

normalize <- function(x){ 
    (x - min(x))/(max(x)-min(x)) 
} 

df %>% 
    group_by(Sample, Group) %>% 
    mutate(Time_std = normalize(Time), 
     Value_std = normalize(Value)) %>% 
    arrange(Sample, Group, Time_std) 

    # Sample Group Time Value Time_std Value_std 
    # Sample1 Heart 317 2895 0.00000000 0.47500000 
    # Sample1 Heart 389 3441 0.57600000 1.00000000 
    # Sample1 Heart 436 2755 0.95200000 0.34038462 
    # Sample1 Heart 442 2401 1.00000000 0.00000000 
    # Sample1 Liver 149 2513 0.00000000 0.00000000 
    # Sample1 Liver 154 2792 0.01428571 0.24303136 
    # Sample1 Liver 157 3661 0.02285714 1.00000000 
    # Sample1 Liver 272 3510 0.35142857 0.86846690 
    # Sample1 Liver 499 2535 1.00000000 0.01916376 
    # Sample2 Heart 179 1877 0.00000000 0.15939905 
    # Sample2 Heart 204 4171 0.39062500 1.00000000 
    # Sample2 Heart 243 1442 1.00000000 0.00000000 
    # Sample2 Liver 117 4011 0.00000000 0.92470805 
    # Sample2 Liver 147 1002 0.10238908 0.00000000 
    # Sample2 Liver 160 4256 0.14675768 1.00000000 
    # Sample2 Liver 192 4236 0.25597270 0.99385372 
    # Sample2 Liver 246 2096 0.44027304 0.33620160 
    # Sample2 Liver 265 1379 0.50511945 0.11585741 
    # Sample2 Liver 283 4244 0.56655290 0.99631223 
    # Sample2 Liver 410 3832 1.00000000 0.86969883

來源

2017-03-28 20:47:05 Craig

謝謝你的回答。如果我運行它，我得到Sample1的值爲1，而Sample2的值爲0，這不幸的是不正確。 https://dl.dropboxusercontent.com/u/36889/Screen%20Shot%202017-03-28%20at%2022.52.31。png – Mollan

我編輯了我的答案，使其可以用'set.seed'重現，並且更容易用'arrange'讀取輸出。每組中對應於最小/最大值的值總是爲0和1，所以我很害怕我不完全明白你的意思 – Craig

謝謝，對不起，作爲新手，我有幾個困難。在你的例子中，我發現不正確的是一切都在相同的0-1歸一化（即整個列爲0和1）。我想有四個0和四個1（中間有值），每個對應於每個樣本。 – Mollan

使用data.table你可以去這個使用下面的方法。

玩具例子：

library(data.table) 
normalize <- function(x){ 
    (x - min(x))/(max(x)-min(x)) 
} 

df <- data.table(group = c(1, 1, 1, 1, 2, 2, 2), measure = c(10, 20, 0, 2, 1, 1, 10)) 
df[, measure_normalized := normalize(measure), by = group]

來源

2017-03-28 20:38:10 67342343

謝謝你的答案。然而，由於這個方法的標準化還包括在同一組中進行的其他藥物濃度，所以我確實對這種方法有問題，所以基線將接近於0並且最高濃度接近1，而我正在尋找的是歸一化在每個組的每個樣品的每個劑量內。 – Mollan

正常化數據幀的水平

回答

相關問題