如何獲得應急表？

我正在嘗試從特定類型的數據創建一個應急表。這可以用循環等來實現......但是因爲我的最終表格將包含10E5以上的單元格，我正在尋找一個預先存在的函數。如何獲得應急表？

我的初步數據如下：

PLANT     ANIMAL       INTERACTIONS 
---------------------- ------------------------------- ------------ 
Tragopogon_pratensis Propylea_quatuordecimpunctata   1 
Anthriscus_sylvestris Rhagonycha_nigriventris    3 
Anthriscus_sylvestris Sarcophaga_carnaria     2 
Heracleum_sphondylium Sarcophaga_carnaria     1 
Anthriscus_sylvestris Sarcophaga_variegata     4 
Anthriscus_sylvestris Sphaerophoria_interrupta_Gruppe  3 
Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe  1

我想創建一個表是這樣的：

     Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe 
---------------------- ----------------------------- ----------------------- ------------------- -------------------- ------------------------------- 
Tragopogon_pratensis 1        0      0     0     0 
Anthriscus_sylvestris 0        3      2     4     3 
Heracleum_sphondylium 0        0      1     0     0 
Cerastium_holosteoides 0        0      0     0     1

也就是說，在一行中的所有植物物種，在列的所有動物種類，有時候沒有交互（儘管我的初始數據只列出發生的交互）。

來源

2011-09-16 Julien

10E5細胞中聯表！你在做什麼分析？如果您使用卡方檢查交互，則需要在每個單元格中至少有5個觀察值。 – Ramnath

在基R，可使用table或xtabs：

with(warpbreaks, table(wool, tension)) 

    tension 
wool L M H 
    A 9 9 9 
    B 9 9 9 

xtabs(~wool+tension, data=warpbreaks) 

    tension 
wool L M H 
    A 9 9 9 
    B 9 9 9

的gmodels包具有提供了一種類似於SPSS或SAS的什麼用戶輸出的功能CrossTable預計：

library(gmodels) 
with(warpbreaks, CrossTable(wool, tension)) 


    Cell Contents 
|-------------------------| 
|      N | 
| Chi-square contribution | 
|   N/Row Total | 
|   N/Col Total | 
|   N/Table Total | 
|-------------------------| 


Total Observations in Table: 54 


      | tension 
     wool |   L |   M |   H | Row Total | 
-------------|-----------|-----------|-----------|-----------| 
      A |   9 |   9 |   9 |  27 | 
      |  0.000 |  0.000 |  0.000 |   | 
      |  0.333 |  0.333 |  0.333 |  0.500 | 
      |  0.500 |  0.500 |  0.500 |   | 
      |  0.167 |  0.167 |  0.167 |   | 
-------------|-----------|-----------|-----------|-----------| 
      B |   9 |   9 |   9 |  27 | 
      |  0.000 |  0.000 |  0.000 |   | 
      |  0.333 |  0.333 |  0.333 |  0.500 | 
      |  0.500 |  0.500 |  0.500 |   | 
      |  0.167 |  0.167 |  0.167 |   | 
-------------|-----------|-----------|-----------|-----------| 
Column Total |  18 |  18 |  18 |  54 | 
      |  0.333 |  0.333 |  0.333 |   | 
-------------|-----------|-----------|-----------|-----------|

來源

2011-09-16 09:06:40 Andrie

你能解釋一下那些小數位數字是什麼意思嗎？我使用gmodels創建一個應急表，我設置爲TRUE的唯一參數是prop.c（即其他所有設置都爲FALSE）。我仍然得到一個額外的數字，與列的百分比和單元格的實際n值一起顯示......我不能爲我的生活弄清楚它是什麼（是的，我搜索了很多如何解釋輸出！）。謝謝。 – AHegde

你的答案在上面的輸出中。在輸出的頂部是一個名爲'Cell Contents'的盒子。它解釋了每個數字的含義，即卡方和各種行和列分數。 – Andrie

的reshape包應該做的伎倆。

> library(reshape) 

> df <- data.frame(PLANT = c("Tragopogon_pratensis","Anthriscus_sylvestris","Anthriscus_sylvestris","Heracleum_sphondylium","Anthriscus_sylvestris","Anthriscus_sylvestris","Cerastium_holosteoides"), 
        ANIMAL= c("Propylea_quatuordecimpunctata","Rhagonycha_nigriventris","Sarcophaga_carnaria","Sarcophaga_carnaria","Sarcophaga_variegata","Sphaerophoria_interrupta_Gruppe","Sphaerophoria_interrupta_Gruppe"), 
        INTERACTIONS = c(1,3,2,1,4,3,1), 
        stringsAsFactors=FALSE) 

> df <- melt(df,id.vars=c("PLANT","ANIMAL"))  
> df <- cast(df,formula=PLANT~ANIMAL) 
> df <- replace(df,is.na(df),0) 

> df 
        PLANT Propylea_quatuordecimpunctata Rhagonycha_nigriventris 
1 Anthriscus_sylvestris        0      3 
2 Cerastium_holosteoides        0      0 
3 Heracleum_sphondylium        0      0 
4 Tragopogon_pratensis        1      0 
    Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe 
1     2     4        3 
2     0     0        1 
3     1     0        0 
4     0     0        0

我還在搞清楚如何解決order問題，有什麼建議嗎？

來源

2011-09-16 09:17:23 lokheart

您可以用一條命令替換最後三行：cast（PLANT〜ANIMAL，data = df，value =「INTERACTIONS」，fill = 0） – Thierry

如果您想根據輸入數據幀排序順序，你可以在行上使用'order（unique（df $ PLANT））'，在列上使用明顯的模擬。您的示例不需要「唯一」，但是每個配對的多個條目的值可能需要它的值。 –

XTABS在基礎R應該工作，例如：

dat <- data.frame(PLANT = c("p1", "p2", "p2", "p4", "p5", "p5", "p6"), 
        ANIMAL = c("a1", "a2", "a3", "a3", "a4", "a5", "a5"), 
        INTERACTIONS = c(1,3,2,1,4,3,1), 
        stringsAsFactors = FALSE) 

(x2.table <- xtabs(dat$INTERACTIONS ~ dat$PLANT + dat$ANIMAL)) 

    dat$ANIMAL 
dat$PLANT a1 a2 a3 a4 a5 
     p1 1 0 0 0 0 
     p2 0 3 2 0 0 
     p4 0 0 1 0 0 
     p5 0 0 0 4 3 
     p6 0 0 0 0 1 

chisq.test(x2.table, simulate.p.value = TRUE)

我認爲應該做你要找的相當容易的。我不確定它是如何在效率方面擴大到10E5應急表，但這可能是統計學上的一個單獨問題。

來源

2013-12-26 05:26:42 djhocking

我想指出的是，我們可以得到Andrie公佈了相同的結果，而無需使用功能with：

R基本套餐

# 3 options 
table(warpbreaks[, 2:3]) 
table(warpbreaks[, c("wool", "tension")]) 
table(warpbreaks$wool, warpbreaks$tension, dnn = c("wool", "tension")) 

    tension 
wool L M H 
    A 9 9 9 
    B 9 9 9

包裝gmodels：

library(gmodels) 
# 2 options  
CrossTable(warpbreaks$wool, warpbreaks$tension) 
CrossTable(warpbreaks$wool, warpbreaks$tension, dnn = c("Wool", "Tension")) 


    Cell Contents 
|-------------------------| 
|      N | 
| Chi-square contribution | 
|   N/Row Total | 
|   N/Col Total | 
|   N/Table Total | 
|-------------------------| 


Total Observations in Table: 54 


       | warpbreaks$tension 
warpbreaks$wool |   L |   M |   H | Row Total | 
----------------|-----------|-----------|-----------|-----------| 
       A |   9 |   9 |   9 |  27 | 
       |  0.000 |  0.000 |  0.000 |   | 
       |  0.333 |  0.333 |  0.333 |  0.500 | 
       |  0.500 |  0.500 |  0.500 |   | 
       |  0.167 |  0.167 |  0.167 |   | 
----------------|-----------|-----------|-----------|-----------| 
       B |   9 |   9 |   9 |  27 | 
       |  0.000 |  0.000 |  0.000 |   | 
       |  0.333 |  0.333 |  0.333 |  0.500 | 
       |  0.500 |  0.500 |  0.500 |   | 
       |  0.167 |  0.167 |  0.167 |   | 
----------------|-----------|-----------|-----------|-----------| 
    Column Total |  18 |  18 |  18 |  54 | 
       |  0.333 |  0.333 |  0.333 |   | 
----------------|-----------|-----------|-----------|-----------|

來源

2015-05-18 08:17:18 mpalanco

只需使用「reshape2」封裝的dcast()功能：

ans = dcast(df, PLANT~ ANIMAL,value.var = "INTERACTIONS", fill = 0)

這裏「PLANT」將位於頂部行的「ANIMALS」左側列表中，填充表格將使用「INTERACTIONS」進行，「NULL」值將使用0填充。

來源

2016-09-30 07:14:58 Ayush

隨着dplyr/tidyr：

df <- read.table(text='PLANT     ANIMAL       INTERACTIONS 
       Tragopogon_pratensis Propylea_quatuordecimpunctata   1 
       Anthriscus_sylvestris Rhagonycha_nigriventris    3 
       Anthriscus_sylvestris Sarcophaga_carnaria     2 
       Heracleum_sphondylium Sarcophaga_carnaria     1 
       Anthriscus_sylvestris Sarcophaga_variegata     4 
       Anthriscus_sylvestris Sphaerophoria_interrupta_Gruppe  3 
       Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe  1', header=TRUE) 
library(dplyr) 
library(tidyr) 
df %>% spread(ANIMAL, INTERACTIONS, fill=0) 

#     PLANT Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe 
# 1 Anthriscus_sylvestris        0      3     2     4        3 
# 2 Cerastium_holosteoides        0      0     0     0        1 
# 3 Heracleum_sphondylium        0      0     1     0        0 
# 4 Tragopogon_pratensis        1      0     0     0        0

來源

2017-02-10 17:21:48

如何獲得應急表？

回答

相關問題