我的問題與Connecting across missing values with geom_line密切相關,但它是後續而非重複。通過geom_line連接選定的NAs點()
我有缺失值NA的數據。數據已被長期「融化」,包裝爲reshape2
,我使用ggplot2
繪製了geom_points()
和geom_line()
。在示例數據中,我只有一個組,在我的真實數據中,我有幾個組。我想繪製一個連接數據點的geom_line()
,這些數據點之間沒有超過4年的缺失數據。換句話說,如果有3個相鄰的NA,則對數據幀應用na.rm
,而如果NA至少有4個相鄰的行,則不要將na.rm
應用於data.frame。
編輯:注意:我正在複製一本書中的數字,即使在數據丟失的情況下點也連接在一起。對於那些連接缺失數據的段使用不同的linetype
或colour
,以及圖例中的註解來解釋它。
在下面,我有一個非常乏味和醜陋的黑客,不會擴大到操縱大量的數據。我很感激更簡單的方法,特別希望找到一種簡單的方法來計算數據中連續的NAs實例。
### ggplot draws geom_line with NAs
# Data (real-world example, so not exactly MWE)
df <-
structure(list(Year = c(1910, 1911, 1912, 1913, 1914, 1915, 1916,
1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927,
1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938,
1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949,
1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960,
1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982,
1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
2005, 2006, 2007, 2008, 2009, 2010), variable = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("France", "Germany", "Sweden", "Japan"
), class = c("ordered", "factor")), value = c(0.1724, 0.1748,
0.1752, 0.1777, 0.1778, 0.1953, 0.2132, 0.2242, 0.222, 0.1947,
NA, NA, NA, NA, NA, 0.113, 0.113, 0.115, 0.112, 0.111, NA, NA,
0.114, 0.109, 0.113, 0.12, 0.137, 0.15, 0.163, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 0.116, NA, NA, NA, NA, NA, NA, 0.11,
NA, NA, NA, 0.122, NA, NA, NA, 0.122, NA, NA, 0.112, NA, NA,
0.113, NA, NA, 0.101, NA, NA, 0.102, NA, NA, 0.1043, NA, NA,
0.0906, NA, NA, 0.0964, NA, NA, 0.1052, NA, NA, 0.1043, NA, NA,
0.1005, NA, NA, 0.1088, NA, NA, 0.101139312657167, 0.0950290025146689,
0.0901042749371333, 0.09, 0.107249622799665, 0.108891198658843,
0.115913495389774, 0.110684772282761, 0.113299133836267, 0.111991953059514
)), .Names = c("Year", "variable", "value"), row.names = 102:202, class = "data.frame")
默認的情節:
library("ggplot2")
ggplot(data = df, aes(x = Year, y = value, group = variable, colour = variable, shape = variable)) +
geom_point(size = 3) + geom_line()
刪除了所有的NAS圖(見Connecting across missing values with geom_line):
ggplot(data = df, aes(x = Year, y = value, group = variable, colour = variable, shape = variable)) +
geom_point(size = 3) + geom_line(data = df[!is.na(df$value), ])
所需的情節:
df2 <- df
df2[df2$Year == 1922, ]$value <- "-999999"
df2[df2$Year == 1948, ]$value <- "-999999"
df2 <- df2[!is.na(df2$value), ]
df2$value <- as.numeric(df2$value)
ggplot(data = df2, aes(x = Year, y = value, group = variable, colour = variable, shape = variable)) + geom_point(size = 3) +
geom_line() + scale_y_continuous(limit = c(.08, .23))
您想要的情節與小時規則不一致。 1950年的這一點應該是孤立的,因爲1939 - 1949年是'NA',1951 - 1956年也是如此。兩者都是> 3 NA的序列。 – jlhoward