我可以自定義pdf2json命令行實用程序的輸出,以便輸出json文件具有特定的結構嗎?pdf2json:如何自定義輸出json文件?
我想從pdf中提取數據(請參見下圖)並將其存儲爲json文件。
我試過pdf2json -f [input directory or pdf file]
。該命令會輸出包含我需要的信息的JSON文件,但它也包含了大量的信息,我不需要:
{"formImage":{"Transcoder":"[email protected]","Agency":"","Id":{"AgencyId":"","Name":"","MC":false,"Max":1,"Parent":""},"Pages":[{"Height":49.5,"HLines":[{"x":13.111828125000002,"y":4.678418750000001,"w":0.44775000000000004,"l":78.96384375000001},{"x":13.111828125000002,"y":44.074375,"w":0.44775000000000004,"l":78.96384375000001}],"VLines":[],"Fills":[{"x":0,"y":0,"w":0,"h":0,"clr":1}],"Texts":[{"x":13.632429687500002,"y":4.382312499999998,"w":4.163000000000001,"clr":0,"A":"left","R":[{"T":"abundant","S":-1,"TS":[0,13.9091,0,0]}]},{"x":25.021517303398443,"y":4.382312499999998,"w":4.139000000000001,"clr":0,"A":"left","R":[{"T":"positive%3A1","S":-1,"TS":[0,13.9091,0,0]}]},{"x":32.38324218816407,"y":4.382312499999998,"w":4.412000000000001,"clr":0,"A":"left","R":[{"T":"negative%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":40.12887364285157,"y":4.382312499999998,"w":3.1670000000000003,"clr":0,"A":"left","R":[{"T":"anger%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":46.1237223885547,"y":4.382312499999998,"w":5.993,"clr":0,"A":"left","R":[{"T":"anticipation%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":56.09123069480469,"y":4.382312499999998,"w":3.8400000000000003,"clr":0,"A":"left","R":[{"T":"disgust%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":63.0324864791797,"y":4.382312499999998,"w":2.4170000000000003,"clr":0,"A":"left","R":[{"T":"fear%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":67.97264684597657,"y":4.382312499999998,"w":2.109,"clr":0,"A":"left","R":[{"T":"joy%3A1","S":-1,"TS":[0,13.9091,0,0]}]},{"x":72.47968185183595,"y":4.382312499999998,"w":4.013,"clr":0,"A":"left","R":[{"T":"sadness%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":79.66421908894532,"y":4.382312499999998,"w":4.178000000000001,"clr":0,"A":"left","R":[{"T":"surprise%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":87.08078776941407,"y":4.382312499999998,"w":2.8930000000000002,"clr":0,"A":"left","R":[{"T":"trust%3A0","S":-1,"TS":[0,13.9091,0,0]}]},{"x":13.632429687500002,"y":5.017468750000002,"w":2.4480000000000004,"clr":0,"A":"left","R":
我只需要從PDF文件中的文本。我不需要關於格式的任何信息。所以,我需要的是這樣的:
{"data":
{
"abundant": {
"positive":1,
"negative":0,
"anger":0,
...
},
"abuse": {...},
"abutment": {...},
...
}
}
你有沒有發現任何變通,使這可能嗎? –
@SrikanthJeeva不,我解析了pdf2json的輸出以獲取我想要的數據 – CherryQu