2014-05-04 57 views
1

我有兩個數據幀,我試圖確定是否存在相關性。我試圖問的基本問題是冬季天氣模式是否會導致出生率上升(九個月後)。解讀兩個數據幀之間的相關性很困難

數據幀已被簡化爲僅包含(我所假定的)所需的信息。天氣數據框僅包含與9個月後的出生數據框架相符的觀測值。當我使用ccf函數時,它成功繪製數據,但我知道我沒有正確設置它。我需要考慮一個變量(model.weather),在與其他人(model.birth)繪製關聯之前9個月發生。

現在,它的安裝非常簡單地:

ccf(model.weather$EVENT_TYPE, model.births$BIRTH_TOTAL) 

任何人可以幫助我九個月適當偏移的數據?

下面是兩個數據幀的樣子:

dput(model.weather) 
structure(list(DATE = structure(c(13514, 13514, 13545, 13545, 
13545, 13545, 13545, 13545, 13545, 13545, 13545, 13545, 13545, 
13545, 13545, 13545, 13545, 13545, 13545, 13545, 13573, 13573, 
13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573, 
13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573, 13573, 
13573, 13573, 13604, 13604, 13604, 13848, 13848, 13848, 13848, 
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 
13848, 13848, 13848, 13848, 13848, 13848, 13848, 13848, 13879, 
13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879, 
13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879, 13879, 
13879, 13879, 13879, 13910, 13910, 13910, 13910, 13910, 13910, 
13910, 13910, 13910, 13910, 13910, 13910, 13910, 13910, 13910, 
13910, 13910, 13910, 13910, 13910, 13910, 13910, 13910, 13939, 
13939, 13939, 13939, 13939, 13939, 13939, 14214, 14214, 14214, 
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 
14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 14214, 
14214, 14214, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 
14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 
14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 14245, 
14245, 14245, 14245, 14245, 14276, 14276, 14276, 14276, 14304, 
14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304, 
14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304, 14304, 
14304, 14304, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 
14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 
14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 14579, 
14579, 14579, 14579, 14610, 14610, 14610, 14641, 14641, 14641, 
14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641, 
14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641, 14641, 
14641, 14641, 14641, 14641, 14944, 14944, 14944, 14944, 14944, 
14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944, 
14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944, 14944, 
14944, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 14975, 
14975, 14975, 14975, 15006, 15006, 15006, 15006, 15006, 15006, 
15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 
15006, 15006, 15006, 15006, 15006, 15006, 15006, 15006, 15034 
), class = "Date"), EVENT_TYPE = structure(c(5L, 5L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 
5L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 2L, 2L, 2L, 8L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Hail", "Heavy Snow", 
"Winter Storm", "Winter Weather", "Ice Storm", "Frost/Freeze", 
"WINTER WEATHER", "Blizzard"), class = "factor")), .Names = c("DATE", 
"EVENT_TYPE"), row.names = c(1475L, 1476L, 1477L, 1478L, 1479L, 
1480L, 1481L, 1482L, 1483L, 1484L, 1485L, 1486L, 1487L, 1488L, 
1489L, 1490L, 1491L, 1492L, 1493L, 1494L, 1495L, 1496L, 1497L, 
1498L, 1499L, 1500L, 1501L, 1502L, 1503L, 1504L, 1505L, 1506L, 
1507L, 1508L, 1509L, 1510L, 1511L, 1512L, 1513L, 1514L, 1515L, 
1516L, 1519L, 1520L, 1521L, 1588L, 1589L, 1590L, 1591L, 1592L, 
1593L, 1594L, 1595L, 1596L, 1597L, 1598L, 1599L, 1600L, 1601L, 
1602L, 1603L, 1604L, 1605L, 1606L, 1608L, 1609L, 1610L, 1611L, 
1612L, 1613L, 1614L, 1615L, 1616L, 1617L, 1618L, 1619L, 1620L, 
1621L, 1622L, 1623L, 1624L, 1625L, 1626L, 1627L, 1628L, 1629L, 
1630L, 1631L, 1632L, 1633L, 1634L, 1635L, 1636L, 1638L, 1642L, 
1643L, 1644L, 1645L, 1646L, 1647L, 1648L, 1649L, 1650L, 1651L, 
1652L, 1653L, 1654L, 1655L, 1656L, 1657L, 1658L, 1659L, 1660L, 
1661L, 1662L, 1665L, 1666L, 1671L, 1672L, 1673L, 1674L, 1679L, 
1680L, 1681L, 1682L, 1683L, 1684L, 1685L, 1686L, 1687L, 1688L, 
1689L, 1690L, 1691L, 1692L, 1693L, 1694L, 1696L, 1697L, 1698L, 
1699L, 1700L, 1701L, 1702L, 1703L, 1863L, 1864L, 1865L, 1866L, 
1867L, 1868L, 1869L, 1870L, 1871L, 1872L, 1873L, 1874L, 1877L, 
1878L, 1879L, 1880L, 1881L, 1882L, 1883L, 1884L, 1885L, 1886L, 
1887L, 1888L, 1889L, 1890L, 1891L, 1892L, 1893L, 1894L, 1895L, 
1896L, 1897L, 1898L, 1899L, 1900L, 1901L, 1902L, 1903L, 1904L, 
1905L, 1906L, 1907L, 1910L, 1911L, 1916L, 1917L, 1918L, 1919L, 
1920L, 1921L, 1922L, 1923L, 1924L, 1925L, 1926L, 1927L, 1928L, 
1929L, 1933L, 1934L, 1935L, 1938L, 1940L, 1941L, 1942L, 1943L, 
1944L, 1945L, 1946L, 1947L, 1948L, 1950L, 1951L, 1952L, 1953L, 
1955L, 1956L, 1957L, 1958L, 1959L, 1960L, 1961L, 1962L, 1964L, 
1965L, 1966L, 1967L, 1968L, 1969L, 1974L, 1976L, 1977L, 1978L, 
1979L, 1980L, 1981L, 1982L, 1983L, 1984L, 1985L, 1986L, 1987L, 
1988L, 1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 
1998L, 2071L, 2072L, 2073L, 2074L, 2075L, 2076L, 2077L, 2078L, 
2079L, 2080L, 2081L, 2082L, 2083L, 2084L, 2085L, 2086L, 2087L, 
2088L, 2089L, 2090L, 2091L, 2092L, 2093L, 2094L, 2095L, 2096L, 
2097L, 2098L, 2105L, 2106L, 2107L, 2108L, 2109L, 2110L, 2111L, 
2112L, 2113L, 2114L, 2115L, 2116L, 2117L, 2118L, 2119L, 2122L, 
2123L, 2124L, 2125L, 2126L, 2127L, 2128L, 2129L, 2130L, 2131L, 
2132L, 2133L, 2134L, 2184L, 2185L, 2186L, 2187L, 2189L, 2190L, 
2191L, 2192L, 2193L, 2194L, 2195L, 2196L, 2197L, 2198L, 2199L, 
2200L, 2201L, 2202L, 2203L, 2204L, 2205L, 2206L, 2207L, 2208L, 
2209L, 2212L, 2213L, 2214L, 2215L, 2216L, 2217L, 2218L, 2219L, 
2220L, 2221L, 2222L, 2223L, 2224L, 2225L, 2226L, 2227L, 2228L, 
2229L, 2230L, 2231L, 2232L, 2233L, 2234L, 2235L, 2236L, 2237L, 
2238L, 2239L, 2240L, 2241L, 2242L, 2243L, 2244L, 2245L, 2246L, 
2247L, 2248L, 2249L, 2250L, 2251L, 2252L, 2253L, 2254L, 2255L, 
2256L, 2257L, 2258L, 2259L, 2260L, 2261L, 2262L, 2263L, 2264L, 
2265L, 2266L, 2267L, 2268L, 2269L, 2270L, 2271L, 2272L, 2273L, 
2274L, 2275L, 2276L, 2277L, 2278L, 2279L, 2280L, 2281L, 2282L, 
2283L, 2284L, 2285L, 2286L, 2287L, 2288L, 2289L, 2290L, 2291L, 
2292L, 2293L, 2294L, 2295L, 2303L, 2304L, 2305L, 2308L), class = "data.frame") 

dput(model.births) 
structure(list(DATE = structure(c(13514, 13545, 13573, 13604, 
13634, 13665, 13695, 13726, 13757, 13787, 13818, 13848, 13879, 
13910, 13939, 13970, 14000, 14031, 14061, 14092, 14123, 14153, 
14184, 14214, 14245, 14276, 14304, 14335, 14365, 14396, 14426, 
14457, 14488, 14518, 14549, 14579, 14610, 14641, 14669, 14700, 
14730, 14761, 14791, 14822, 14853, 14883, 14914, 14944, 14975, 
15006, 15034, 15065, 15095, 15126, 15156, 15187, 15218, 15248, 
15279, 15309), class = "Date"), BIRTH_TOTAL = c(6250, 5833, 6570, 
6227, 6858, 6735, 6933, 7291, 6385, 6466, 6198, 6221, 6341, 6051, 
6444, 6396, 6781, 6583, 6820, 6803, 6531, 6510, 5627, 6135, 5976, 
5515, 6208, 6261, 6520, 6509, 6834, 6616, 6489, 6318, 5730, 6040, 
5667, 5459, 6162, 6212, 6221, 6194, 6469, 6380, 6342, 5981, 5853, 
5925, 5979, 5414, 6070, 6085, 6242, 6438, 6506, 6459, 6260, 6158, 
5754, 5801)), .Names = c("DATE", "BIRTH_TOTAL"), row.names = c(NA, 
-60L), class = "data.frame") 
+1

你的整個方法是錯誤的。滯後於數據框很容易,但在「model.birth」中,日期是唯一的,但它們在「model.weather」data.frame中不是唯一的。因此,你正在比較不同的日期和相同的日期等。 –

+0

@DavidArenburg:我有這樣的感覺。 model.births是五年/六十個月;將模型天氣中的事件按月份/年進行總計,以反映相同的六十個月,然後進行比較是否合理?如果是這樣,你能指出一個參考如何做到這一點? – thebonafortuna

+1

你可以在這裏做兩件事。首先是對每個事件(「大雪」,「冬季風暴」等)給予相同的權重,並在每個獨特日期對其進行「堆積」,或者可以將每個事件分別與滯後模型進行比較。如果你喜歡 –

回答

2

所以我們大家在評論中討論的,你要比較「蘋果與蘋果」,因此,這兩個數據集必須由唯一被比較日期。

First方法將給予相同的權重,以每個事件,數了比較「model.births」

## Aggrgating "model.weather" by date and counting events 
aggmodel.weather <- aggregate(EVENT_TYPE ~ DATE, data = model.weather, length) 
## Merging to "model.births" by DATE 
model.births <- merge(model.births, aggmodel.weather, by = "DATE", all.x = T) 
## Setting the missing events to zero 
model.births[is.na(model.births$EVENT_TYPE), "EVENT_TYPE"] <- 0 
## Running `ccf` funciton, notice the documentation of `ccf` which states "The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]" 
ccf(model.births$BIRTH_TOTAL, model.births$EVENT_TYPE) 

enter image description here

輸出是決定性的海事組織。進一步閱讀

第二種方法見here是每種類型的事件比較「model.weather」到「model.birth」

## Checking the event types 
table(model.weather$EVENT_TYPE) 
##  Hail  Heavy Snow Winter Storm Winter Weather 
##  0   283   127    0 
##Ice Storm Frost/Freeze WINTER WEATHER  Blizzard 
##  16    0    0    1 
## Lets try "Heavy Snow" as it seems the most frequent (doing everything as previously) 
Heavy.Snow <- model.weather[model.weather$EVENT_TYPE == "Heavy Snow", ] 
Heavy.Snow <- aggregate(EVENT_TYPE ~ DATE, data = Heavy.Snow, length) 
model.births <- merge(model.births, Heavy.Snow, by = "DATE", all.x = T) 
model.births[is.na(model.births$EVENT_TYPE.y), "EVENT_TYPE.y"] <- 0 
ccf(model.births$BIRTH_TOTAL, model.births$EVENT_TYPE.y) 

enter image description here

輸出看起來幾乎相同。你也可以試試其他一些「EVENT_TYPE」。

此代碼僅用於說明目的,有關進一步分析,請參閱上面的鏈接。

最後一件事,如果你想通過9個月至滯後「model.births」數據,你可以簡單地做:

model.births$BIRTH_TOTAL2 <- c(model.births$BIRTH_TOTAL[10 : (length(model.births$BIRTH_TOTAL))], rep(NA, 9)) 
model.births <- model.births[complete.cases(model.births), ] 

"BIRTH_TOTAL2"將是你的滯後變量

+0

非常感謝,大衛。我現在正在探索這些內容,並會閱讀您發佈的鏈接。這非常有用,所以再次感謝你。 – thebonafortuna

相關問題