2013-05-19 65 views
0

我計劃使用Java來處理Markdown文本文件,這些文本文件以YAML格式指定文檔開頭的其他元信息,如標題,作者,創建日期等。這裏有一個例子:使用FilterInputStream刪除InputStream的剩餘部分

--- 
title: An example document 
author: Paul 
created: 2013-05-19 
--- 

The _body_ of this document is 
written in **Markdown**. 

爲了解析這些數據YAML,我可以用snakeyaml。據我所知,您可以通過方法yaml.load()yaml.loadAll()(請參閱the SnakeYAML documentationAPI)加載來自java.io.InputStream,java.io.ReaderString的YAML文檔。

我不想使用從String讀取的版本,因爲這會導致大文件的性能問題。但使用該文件作爲InputStream失敗,因爲該流不代表有效的YAML文檔。只有流的第一部分表示有效的文檔。

所以我的問題是:如何使用java.io.FilterInputStream/java.io.FilterReader或其他方法來生成流,第二---因此整體流是有效的YAML後停止?

回答

0

這裏是我的解決方案(Scala代碼):

import java.io.InputStreamReader 
import java.io.InputStream 
import java.nio.charset.Charset 

import scala.collection.mutable.Queue 

/** 
* Reader for Metadata that is contained in the given `InputStream`. 
* 
* @constructor Create a new metadata reader with a given `Charset`. 
* @param in underlying input stream 
* @param charset encoding of the stream 
*/ 
class MetadataReader(in: InputStream, charset: Charset) 
    extends InputStreamReader(in, charset) { 
    private val lookahead = Queue.empty[Int] // buffer for looking ahead 
    private var afterNewline = true // indicates that the last char was a newline 
    private var divider = 0 // number of divider characters in a row ('-') 

    /** 
    * Create new MetadataReader with the systems default `Charset`. 
    * 
    * @param in underlying input stream 
    */ 
    def this(in: InputStream) = this(in, Charset.defaultCharset()) 

    /** 
    * Read the next character. 
    * 
    * @return next character 
    */ 
    override def read: Int = 
    if (divider == 2) { 
     -1 
    } else if (!lookahead.isEmpty) { 
     lookahead.dequeue 
    } else { 

     // read next character 
     def readNext: Int = 
     if (lookahead.length == 3) { 
      divider += 1 
      read 
     } else { 
      val c = super.read 
      if (c == '-') { 
      lookahead.enqueue(c) 
      readNext 
      } else { 
      lookahead.enqueue(c) 
      lookahead.dequeue 
      } 
     } 

     readNext 
    } 

    /** 
    * Read characters into a buffer character array. 
    * 
    * @param buf buffer array 
    * @param off offset to start in the array 
    * @param len number of characters to read 
    * @return actually read characters 
    */ 
    override def read(buf: Array[Char], off: Int, len: Int): Int = { 
    var j = 0 
    for (i <- 0 until len) { 
     val c = read 

     if (c == -1) 
     return j 

     if (i >= off) { 
     buf(i) = c.toChar 
     j += 1 
     } 
    } 

    j 
    } 
} 

你可以用這種方式:

val yaml = new Yaml 
val mr = new MetadataReader(new FileInputStream(
    new File("src/test/resources/yaml-test.txt")), Charset.forName("UTF-8")) 
println(yaml.load(mr)) 
mis.close() 

反饋讚賞。

1

在希望YAML解析器應停止的位置添加「...」(三個點)。

+0

這是一個很好的解決方案,但是我必須在'---'之後停止與[Pandoc](http://johnmacfarlane.net/pandoc/)兼容。 – pvorb

相關問題