如何從列中提取月份

我想從Textmining with R web教科書創建一個繪圖，但使用我的數據。它主要搜索每年的頂級詞彙並對它們進行圖表（圖5.4：http://tidytextmining.com/dtm.html）。我的數據比他們開始使用的數據要乾淨一點，但我對R是新手。我的數據有一個2016-01-01格式的日期列（日期類）。我只是自2016年有數據，所以我希望做同樣的事情，但更精細，（按月或按天IE）如何從列中提取月份

library(tidyr) 

year_term_counts <- inaug_td %>% 
extract(document, "year", "(\\d+)", convert = TRUE) %>% 
complete(year, term, fill = list(count = 0)) %>% 
group_by(year) %>% 
mutate(year_total = sum(count)) 

year_term_counts %>% 
filter(term %in% c("god", "america", "foreign", "union", "constitution", 
"freedom")) %>% 
ggplot(aes(year, count/year_total)) + 
geom_point() + 
geom_smooth() + 
facet_wrap(~ term, scales = "free_y") + 
scale_y_continuous(labels = scales::percent_format()) + 
ylab("% frequency of word in inaugural address")

的想法是，我會選擇我的具體的話從我的文字，看到他們如何在幾個月內改變。

謝謝！

來源

2017-06-13 Alex

歡迎來到SO：你有沒有嘗試打破'year_term_counts'函數檢查中間步驟？你是否按照你的期望建立了結果？這將有助於我們看到一些數據。 –

您應該考慮在'lubridate'包中使用'month'函數來創建一個包含月份的整個列。 – ccapizzano

我會查看月份功能，謝謝！ – Alex

如果您希望根據您已有的日期列查看較小的時間單位，我建議您從lubridate查看floor_date()或round_date()函數。我們書中鏈接的特定章節涉及如何處理文檔術語矩陣，然後整理它等等。您是否已經爲數據使用了整齊的文本格式？如果是這樣，那麼你可以做這樣的事情：

date_counts <- tidy_text %>% 
    mutate(date = floor_date(Date, unit = "7 days")) %>% # use whatever time unit you want here 
    count(date, word) %>% 
    group_by(date) %>% 
    mutate(date_total = sum(n)) 

date_counts %>% 
    filter(word %in% c("PUT YOUR LIST OF WORDS HERE")) %>% 
    ggplot(aes(date, n/date_total)) + 
    geom_point() + 
    geom_smooth() + 
    facet_wrap(~ word, scales = "free_y")

來源

2017-06-14 04:02:51

謝謝，朱莉婭！我一直在閱讀你的新書。我是R的新手，但它非常有幫助。 – Alex

如何從列中提取月份

回答

相關問題