2014-01-28 264 views
0

我有兩列數據表格,即Columns列表,即一對疾病和它們的一對。下面是disease_table1在R中繪製圖表

**d1** **d2** **Value** 

Disease1 Disease2 3.5 
Disease3 Disease4 5 
Disease5 Disease6 1.1 
Disease1 Disease3 2.4 
Disease6 Disease2 6.7 

真實數據集1(disease_table1)低於第一個(樣本數據):

Bladder cancer       X-linked ichthyosis (XLI)  3.5 
Leukocyte adhesion deficiency (LAD) Aldosterone synthase Deficiency 1.8 
Leukocyte adhesion deficiency (LAD) Brain Cancer      1.5 
Tangier disease      Pancreatic cancer    0.66 

我想說明這兩個數據表之間的差異,同時繪製疾病對及其兩個表的值。 我使用了plot函數和直線函數,但它太簡單了,不能很好地區分。另外我想在繪圖時有疾病對的名稱。

plot(density(disease_table1$value)) 
    lines(density(disease_table1$value)) 

感謝

+3

你能否給我們提供一個[reproducable example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Jaap

+0

我已經添加了真實的數據集,代碼作爲例子。 – Rgeek

+0

400,000+疾病對可能需要一種聚類方法。你可以發佈一個鏈接到你的數據,或更具代表性的子集,說幾千條記錄? – jlhoward

回答

2

一些示例代碼:

# creating dataframes (i made up a second one) 
df1 <- read.table(text = "d1 d2 x 
Disease1 Disease2 3.5 
Disease3 Disease4 5 
Disease5 Disease6 1.1 
Disease1 Disease3 2.4 
Disease6 Disease2 6.7", header = TRUE, strip.white = TRUE) 

df2 <- read.table(text = "d1 d2 y 
Disease1 Disease2 4.5 
Disease3 Disease4 2 
Disease5 Disease6 3.1 
Disease1 Disease3 1.4 
Disease6 Disease2 5.7", header = TRUE, strip.white = TRUE) 

# needed libraries 
library(reshape2) 
library(ggplot2) 

# merging dataframes & creating unique identifier variable 
data <- merge(df1, df2, by = c("d1","d2")) 
data$diseasepair <- paste0(data$d1,"-",data$d2) 

data.long <- melt(data, id="diseasepair", measure=c("x","y"), variable="group") 

# make the plot 
ggplot(data.long) + 
    geom_bar(aes(x = diseasepair, y = value, fill = group), 
      stat="identity", position = "dodge", width = 0.7) + 
    scale_fill_manual("Group\n", values = c("red","blue"), 
        labels = c(" X", " Y")) + 
    labs(x="\nDisease pair",y="Value\n") + 
    theme_bw() 

結果:

enter image description here

這是你看着什麼?

+0

我有40萬對這樣的類型,所以我認爲這不會起作用。儘管如此,它對於較小的數據集效果會很好。我相信曲線或熱圖可以工作嗎? – Rgeek

+0

對於400k對熱圖不會工作,恕我直言。你想比較每一對的值嗎?或者只是針對特定的配對? – Jaap

+0

基本上,我想用一個數據集中的值與另一個數據集中的值來顯示疾病對的富集。因此,我想比較每對數值。 – Rgeek