2014-02-23 50 views
0

我有一大堆我想放入JSON的文本格式的文檔。這裏是我的解析器,它在使用「/ n」的everyline之後將文本拆分爲新的JSON字符串,我想將其更改爲剪切每個段落。Txt to JSON按段落拆分

package main 

import (
"bufio" 
"encoding/json" 
"fmt" 
"io" 
"log" 
"os" 
"strings" 
) 

func main() { 
myBigThing := make(map[string]map[string]string) 
f, _ := os.Open("strangecountess.txt") 
r := bufio.NewReader(f) 
var currentPage map[string]string 
pageNum := 0 
for { 
    line, err := r.ReadString('\n') 
    if err != nil { 
     if err != io.EOF { 
      log.Println("Error in parsing :", err) 
     } 
     break 
    } 
    if currentPage == nil { 
     currentPage = make(map[string]string) 
     myBigThing[fmt.Sprintf("page%d", pageNum)] = currentPage 
     pageNum++ 
    } else if line == "" { 
     currentPage = nil 
    } else { 
     tokens := strings.Split(line, ":") 
     if len(tokens) == 2 { 
      currentPage[tokens[0]] = tokens[1] 
     } 
    } 
} 
f, err := os.Create("strangecountess.json") 
if err != nil { 
    log.Println("Error :", err) 
    return 
} 
defer f.Close() 
bout, _ := json.Marshal(myBigThing) 
f.Write(bout) 
} 

我願意改變語言爲這個特定的任務,如果有在那裏,這樣做,我所有的耳朵有些真棒庫。然而留下去是首選:)。

+0

哪種語言? – jeremyjjbrown

+0

你如何識別一個段落? –

+0

@jeremyjjbrown @jeremyjjbrown這是在去,我不想指定,因爲我是任何將完成這項工作:) – collinglass

回答

0

如果你對其他工具開放,jq可能可以做你所需要的。

假設文件data包含

When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. 
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed. 

命令

$ jq -MR '.' data 

產生串序列,每輸入行一個:

"When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation." 
"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed." 

命令

$ jq -MR -n '[inputs]' data 

將收集行到一個數組:

[ 
    "When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.", 
    "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed." 
] 

一旦你有一個JSON對象,可以很容易地添加更多的處理。例如此過濾器

$ jq -MR -n '[inputs] | map("\(.[:30])... \(length) characters")' data 

總結了每個行:

[ 
    "When in the course of human ev... 404 characters", 
    "We hold these truths to be sel... 337 characters" 
] 

和該命令

$ jq -MR -n 'reduce inputs as $i ({}; .["\(.|length)"]=$i)' data 

收集線成一個對象

{ 
    "0": "When in the course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.", 
    "1": "We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed." 
} 

有一個在線版本在https://jqplay.org/以及。