2014-02-24 39 views
1

我可以自定義pdf2json命令行實用程序的輸出,以便輸出json文件具有特定的結構嗎?pdf2json:如何自定義輸出json文件?

我想從pdf中提取數據(請參見下圖)並將其存儲爲json文件。

fig 1

我試過pdf2json -f [input directory or pdf file]。該命令會輸出包含我需要的信息的JSON文件,但它也包含了大量的信息,我不需要:

{"formImage":{"Transcoder":"[email protected]","Agency":"","Id":{"AgencyId":"","Name":"","MC":false,"Max":1,"Parent":""},"Pages":[{"Height":49.5,"HLines":[{"x":13.111828125000002,"y":4.678418750000001,"w":0.44775000000000004,"l":78.96384375000001},{"x":13.111828125000002,"y":44.074375,"w":0.44775000000000004,"l":78.96384375000001}],"VLines":[],"Fills":[{"x":0,"y":0,"w":0,"h":0,"clr":1}],"Texts":[{"x":13.632429687500002,"y":4.382312499999998,"w":4.163000000000001,"clr":0,"A":"left","R":[{"T":"abundant","S":-1,"TS":[0,13.9091,0,0]}]},{"x":25.021517303398443,"y":4.382312499999998,"w":4.139000000000001,"clr":0,"A":"left","R":[{"T":"positive%3A1","S":-1,"TS":[0,13.9091,0,0]}]},{"x":32.38324218816407,"y":4.382312499999998,"w":4.412000000000001,"clr":0,"A":"left","R":[{"T":"negative%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":40.12887364285157,"y":4.382312499999998,"w":3.1670000000000003,"clr":0,"A":"left","R":[{"T":"anger%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":46.1237223885547,"y":4.382312499999998,"w":5.993,"clr":0,"A":"left","R":[{"T":"anticipation%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":56.09123069480469,"y":4.382312499999998,"w":3.8400000000000003,"clr":0,"A":"left","R":[{"T":"disgust%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":63.0324864791797,"y":4.382312499999998,"w":2.4170000000000003,"clr":0,"A":"left","R":[{"T":"fear%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":67.97264684597657,"y":4.382312499999998,"w":2.109,"clr":0,"A":"left","R":[{"T":"joy%3A1","S":-1,"TS":[0,13.9091,0,0]}]},{"x":72.47968185183595,"y":4.382312499999998,"w":4.013,"clr":0,"A":"left","R":[{"T":"sadness%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":79.66421908894532,"y":4.382312499999998,"w":4.178000000000001,"clr":0,"A":"left","R":[{"T":"surprise%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":87.08078776941407,"y":4.382312499999998,"w":2.8930000000000002,"clr":0,"A":"left","R":[{"T":"trust%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":13.632429687500002,"y":5.017468750000002,"w":2.4480000000000004,"clr":0,"A":"left","R":

我只需要從PDF文件中的文本。我不需要關於格式的任何信息。所以,我需要的是這樣的:

{"data": 
    { 
    "abundant": { 
     "positive":1, 
     "negative":0, 
     "anger":0, 
     ... 
     }, 
    "abuse": {...}, 
    "abutment": {...}, 
    ... 
    } 
} 
+0

你有沒有發現任何變通,使這可能嗎? –

+0

@SrikanthJeeva不,我解析了pdf2json的輸出以獲取我想要的數據 – CherryQu

回答