從文件加載特定的PyYAML文件

我有一個.yml文件，我試圖從它加載某些文件。我知道：從文件加載特定的PyYAML文件

print yaml.load(open('doc_to_open.yml', 'r+'))

將在.yml文件打開第一個（或唯一的）文件，並且：

for x in yaml.load_all(open('doc_to_open.yml', 'r+')): 
    print x

將打印文件中的所有文件YAML。但是說我只想打開文件中的前三個文檔，或者想打開文件中的第8個文檔。我會怎麼做？

來源

2015-09-10 8 Bit Apple

不需要'load（）'或'load_all（）'寫一個可寫入的流，而且你並沒有自己寫入已打開的流，所以你應該用'r'替換r +。並且'load（）'確實打開了「first ** and only only」文檔，如果存在指示第二個文檔的尾隨文檔分隔標記，它實際上會出錯。 – Anthon

如果您不想分析前七個YAML文件，例如出於效率原因，您必須自己搜索第8個文檔。

有可能掛鉤到解析器的第一階段，並計算流內的DocumentStartTokens()的數量，並且只在8號之後開始傳遞令牌，並在9號停止這樣做，但這樣做是遠非瑣碎。而且這至少可以標記所有前面的文件。

完全無效的方式，對於一個有效的替代品IMO來說，需要表現相同的方法是使用.load_all()並在完成標記/解析/編寫/解析所有文檔後選擇合適的文檔¹：

import ruamel.yaml as yaml 

for idx, data in enumerate(yaml.load_all(open('input.yaml', Loader=yaml.RoundTripLoader)): 
    if idx == 7: 
     print(yaml.dump(data, Dumper=yaml.RoundTripDumper))

如果您對文檔input.yaml運行上面：

--- 
document: 0 
--- 
document: 1 
--- 
document: 2 
--- 
document: 3 
--- 
document: 4 
--- 
document: 5 
--- 
document: 6 
--- 
document: 7 # < the 8th document 
--- 
document: 8 
--- 
document: 9 
...

你得到的輸出：

document: 7 # < the 8th document

您可惜不能天真地算了算的（---）文件markers數量，該文件並沒有開始一個：

document: 0 
--- 
document: 1 
. 
.

也不要有第一個標記行，如果該文件指定一個directive²開始：

%YAML 1.2 
--- 
document: 0 
--- 
document: 1 
. 
.

或「文件」開始由只評論：

# the 8th document is the interesting one 
--- 
document: 0 
--- 
document: 1 
. 
.

爲了解釋一切，你可以使用：

def get_nth_yaml_doc(stream, doc_nr): 
    doc_idx = 0 
    data = [] 
    for line in stream: 
     if line == u'---\n' or line.startswith('--- '): 
      doc_idx += 1 
      continue 
     if line == '...\n': 
      break 
     if doc_nr < doc_idx: 
      break 
     if line.startswith(u'%'): 
      continue 
     if doc_idx == 0: # no initial '---' YAML files don't start with 
      if line.lstrip().startswith('#'): 
       continue 
      doc_idx = 1 
     if doc_idx == doc_nr: 
      data.append(line) 
    return yaml.load(''.join(data), Loader=yaml.RoundTripLoader) 

with open("input.yaml") as fp: 
    data = get_nth_yaml_doc(fp, 8) 
print(yaml.dump(data, Dumper=yaml.RoundTripDumper, allow_unicode=True))

，並得到：

document: 7 # < the 8th document

在上述所有情況下，有效地，甚至沒有令牌化前面YAML文件（也不以下）。

還有一個額外的警告，YAML文件可能以byte-order-marker開頭，並且individual documents within a stream可以以這些標記開頭。上面的例程不處理這個。

¹_{這是使用ruamel.yaml其中我的作家完成，這是PyYAML的增強版本。 AFAIK PyYAML的工作原理是相同的（但是會在例子中刪除註釋）。}
²_{技術上這個指令是在它自己的directives document中，所以你應該把它當作文檔，但是.load_all()不會給你那個文檔，所以我不這麼認爲。}

來源

2015-09-10 08:24:58 Anthon

從文件加載特定的PyYAML文件

回答

相關問題