2014-05-13 111 views
3

我正在通過jekyll編寫一個關於rmarkdown的相關文檔,我將編譯成一個網站。在這樣做的過程中,我遇到了一個問題:從knitr兒童文檔中剝離YAML

我使用的一些Rmd文件調用其他Rmd文件作爲子文檔。當我使用knitr進行渲染時,生成的文檔包含來自父文檔和子文檔的yaml前端問題。下面給出一個例子。

到目前爲止,我沒有看到任何方式來指定當文檔是Rmd時只有子文檔的一部分。是否有人知道在knit()期間讀入父Rmd時可以將子文檔從子文檔中刪除的方法?

我很樂意考慮R之外的答案,最好是我可以嵌入到rake文件中的東西。儘管如此,我並不想永久性地修改子文檔。所以剝離洋蔥不可能是永久性的。最後,在YAML從文件到文件長度發生變化,所以我猜,任何解決方案必須能夠找到YAML開始和結束中美戰略經濟對話的regex/grep的/的/ etc ...

例:

%%%% Parent_Doc.rmd %%%%

--- 
title: parent doc 
layout: default 
etc: etc 
--- 
This is the parent... 

```{r child import, child="./child_doc."} 
``` 

%%%% child_doc.rmd %%%%

--- 
title: child doc 
layout: default 
etc: etc 
--- 

lorem ipsum etc 

%%%% output.md %%%%

--- 
title: parent doc 
layout: default 
etc: etc 
--- 
This is the parent... 
--- 
title: child doc 
layout: default 
etc: etc 
--- 

lorem ipsum etc 

%%%%理想Output.md %%%%

--- 
title: parent doc 
layout: default 
etc: etc 
--- 
This is the parent... 

lorem ipsum etc 
+1

我可以認爲這是在下一個版本的knitr中的功能請求,如果你將它提交到https://github.com/yihui/knitr/issues –

+0

@Yihuri:我會提出一個功能請求,但它是可能不值得你的結局。我的用例可能相當具體。謝謝你的迴應。 – Tom

+1

不是。我不介意小功能請求:) –

回答

5

與此同時,也許下面就爲你工作;這是一種醜陋和低效的解決方法(我對編程者來說是新手,而不是一個真正的程序員),但它實現了我相信你想要做的事情。

我寫了一個function類似的個人用途,其中包括以下relevant bit;原來是在西班牙,所以我把它翻譯如下一些:

extraction <- function(matter, escape = FALSE, ruta = ".", patron) { 

    require(yaml) 

    # Gather together directory of documents to be processed 

    doc_list <- list.files(
    path = ruta, 
    pattern = patron, 
    full.names = TRUE 
    ) 

    # Extract desired contents 

    lapply(
    X = doc_list, 
    FUN = function(i) { 
     raw_contents <- readLines(con = i, encoding = "UTF-8") 

     switch(
     EXPR = matter, 

     # !YAML (e.g., HTML) 

     "no_yaml" = { 

      if (escape == FALSE) { 

      paste(raw_contents, sep = "", collapse = "\n") 

      } else if (escape == TRUE) { 

      require(XML) 
      to_be_escaped <- paste(raw_contents, sep = "", collapse = "\n") 
      xmlTextNode(value = to_be_escaped) 

      } 

     }, 

     # YAML header and Rmd contents 

     "rmd" = { 
      yaml_pattern <- "[-]{3}|[.]{3}" 
      limits_yaml <- grep(pattern = yaml_pattern, x = raw_contents)[1:2] 
      indices_yaml <- seq(
      from = limits_yaml[1] + 1, 
      to = limits_yaml[2] - 1 
      ) 
      yaml <- mapply(
      FUN = function(i) {yaml.load(string = i)}, 
      raw_contents[indices_yaml], 
      USE.NAMES = FALSE 
      ) 
      indices_rmd <- seq(
      from = limits_yaml[2] + 1, 
      to = length(x = raw_contents) 
      ) 
      rmd<- paste(raw_contents[indices_rmd], sep = "", collapse = "\n") 
      c(yaml, "contents" = rmd) 
     }, 

     # Anything else (just in case) 

     { 
      stop("Matter not extractable") 
     } 

    ) 

    } 
    ) 

} 

說我的主要RMD文件main.Rmd生活my_directory和我的孩子文件,01-abstract.Rmd02-intro.Rmd,...,06-conclusion.Rmd被安置在./sections;請注意,對於我的業餘功能,最好將子文檔按照它們將被傳入主文檔的順序保存(見下文)。我有我的功能extraction.R./assets。這是我的例子目錄結構:

. 
+--assets 
| +--extraction.R 
+--sections 
| +--01-abstract.Rmd 
| +--02-intro.Rmd 
| +--03-methods.Rmd 
| +--04-results.Rmd 
| +--05-discussion.Rmd 
| +--06-conclusion.Rmd 
+--stats 
| +--analysis.R 
+--main.Rmd 

main.Rmd導入我的子文檔從./sections

--- 
title: Main 
author: me 
date: Today 
output: 
    html_document 
--- 

```{r, 'setup', include = FALSE} 
opts_chunk$set(autodep = TRUE) 
dep_auto() 
``` 

```{r, 'import_children', cache = TRUE, include = FALSE} 
source('./assets/extraction.R') 
rmd <- extraction(
    matter = 'rmd', 
    ruta = './sections', 
    patron = "*.Rmd" 
) 
``` 

# Abstract 

```{r, 'abstract', echo = FALSE, results = 'asis'} 
cat(x = rmd[[1]][["contents"]], sep = "\n") 
``` 

# Introduction 

```{r, 'intro', echo = FALSE, results = 'asis'} 
cat(x = rmd[[2]][["contents"]], sep = "\n") 
``` 

# Methods 

```{r, 'methods', echo = FALSE, results = 'asis'} 
cat(x = rmd[[3]][["contents"]], sep = "\n") 
``` 

# Results 

```{r, 'results', echo = FALSE, results = 'asis'} 
cat(x = rmd[[4]][["contents"]], sep = "\n") 
``` 

# Discussion 

```{r, 'discussion', echo = FALSE, results = 'asis'} 
cat(x = rmd[[5]][["contents"]], sep = "\n") 
``` 

# Conclusion 

```{r, 'conclusion', echo = FALSE, results = 'asis'} 
cat(x = rmd[[6]][["contents"]], sep = "\n") 
``` 

# References 

我再編織這個文件,只有我的子文檔的內容納入到其中,例如:

--- 
title: Main 
author: me 
date: Today 
output: 
    html_document 
--- 





# Abstract 


This is **Child Doc 1**, my abstract. 

# Introduction 


This is **Child Doc 2**, my introduction. 

- Point 1 
- Point 2 
- Point *n* 

# Methods 


This is **Child Doc 3**, my "Methods" section. 

| method 1 | method 2 | method *n* | 
|---------------|---------------|----------------| 
| fffffffffffff | fffffffffffff | fffffffffffff d| 
| fffffffffffff | fffffffffffff | fffffffffffff d| 
| fffffffffffff | fffffffffffff | fffffffffffff d| 

# Results 


This is **Child Doc 4**, my "Results" section. 

## Result 1 

```{r} 
library(knitr) 
``` 

```{r, 'analysis', cache = FALSE} 
source(file = '../stats/analysis.R') 
``` 

# Discussion 


This is **Child Doc 5**, where the results are discussed. 

# Conclusion 


This is **Child Doc 6**, where I state my conclusions. 

# References 

前述文件是main.Rmd針織版本,即main.md。注意## Result 1在我的孩子文檔04-results.Rmd中,我提供了一個外部R腳本,./stats/analysis.R,它現在被編入我的針織文檔中作爲一個新的編織塊;因此,我現在需要再次編織文件。

當子文檔還包括塊,而不是編織成.md我編織的主要文件到另一個.Rmd很多次我都塊嵌套,例如,繼續上面的例子:

  1. 使用knit(input = './main.Rmd', output = './main_2.Rmd'),而不是編織main.Rmdmain.md,我會編織成另一個.RMD,以便能夠編織包含新導入的塊的結果文件,例如,我的R腳本analysis.R上面。
  2. 我現在可以將我的main_2.Rmd編織成main.md或通過rmarkdown::render(input = './main_2.Rmd', output_file = './main.html')呈現爲main.html

注意:在上面的main.md的例子中,路徑到我的[R腳本是../stats/analysis.R。這是相對於源自它的子文檔的路徑,./sections/04-results.Rmd。一旦我將子文檔導入位於my_directory根目錄的主文檔,即./main.md./main_2.Rmd,路徑就會出錯;因此,我必須在下一針織之前手動將其更正爲./stats/analysis.R

我在上面提到過,最好將子文檔保存爲與導入到主文檔中的順序相同的順序。這是因爲我的簡單功能extraction()只是將指定給它的所有文件的內容存儲在一個未命名的列表中,因此我必須通過編號訪問main.Rmd中的每個文件,即rmd[[5]][["contents"]]指的是子文檔./sections/05-discussion.Rmd;考慮:

> str(rmd) 
List of 6 
$ :List of 4 
    ..$ title  : chr "child doc 1" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 1**, my abstract." 
$ :List of 4 
    ..$ title  : chr "child doc 2" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 2**, my introduction.\n\n- Point 1\n- Point 2\n- Point *n*" 
$ :List of 4 
    ..$ title  : chr "child doc 3" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 3**, my \"Methods\" section.\n\n| method 1 | method 2 | method *n* |\n|--------------|--------------|----"| __truncated__ 
$ :List of 4 
    ..$ title  : chr "child doc 4" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 4**, my \"Results\" section.\n\n## Result 1\n\n```{r}\nlibrary(knitr)\n```\n\n```{r, cache = FALSE}\nsour"| __truncated__ 
$ :List of 4 
    ..$ title  : chr "child doc 5" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 5**, where the results are discussed." 
$ :List of 4 
    ..$ title  : chr "child doc 6" 
    ..$ layout : chr "default" 
    ..$ etc  : chr "etc" 
    ..$ contents: chr "\nThis is **Child Doc 6**, where I state my conclusions." 

所以,extraction()這裏實際上是兩個存儲指定的子文檔中的R降價內容,以及他們YAML,如果你有這方面的一個應用,以及(我自己做的)。