我正在使用Azure Computer Vision API從圖像中提取文本。在這個特殊的用例中,我試圖提取一些圖像中的文本,看起來像這個「Person ID:########」,其中#是一個數字人物編號。OCR相對於其他值提取JSON值?
這裏是JSON的樣品從API返回:
{"language": "en",
"textAngle": 0.0,
"orientation": "Up",
"regions": [
{
"boundingBox": "212,169,1384,359",
"lines": [
{
"boundingBox": "228,169,281,36",
"words": [
{
"boundingBox": "228,169,141,28",
"text": "Output"
},
{
"boundingBox": "386,169,123,36",
"text": "Report"
}
]
},
{
"boundingBox": "212,279,287,25",
"words": [
{
"boundingBox": "212,280,116,24",
"text": "Person"
},
{
"boundingBox": "341,279,42,25",
"text": "ID:"
},
{
"boundingBox": "408,279,91,25",
"text": "15060"
}
]
},
{
"boundingBox": "279,326,104,25",
"words": [
{
"boundingBox": "279,326,104,25",
"text": "Notes:"
}
]
}
]
},
"boundingBox": "2436,172,159,32",
"lines": [
{
"boundingBox": "2436,172,159,32",
"words": [
{
"boundingBox": "2436,172,159,32",
"text": "Operator:"
}
]
}
]
},
{
"boundingBox": "2627,172,290,216",
"lines": [
{
"boundingBox": "2627,172,103,32",
"words": [
{
"boundingBox": "2627,172,103,32",
"text": "Output"
}
]
},
"boundingBox": "2629,329,288,37",
"words": [
{
"boundingBox": "2683,329,234,37",
"text": "xm"
}
]
},
{
"boundingBox": "2875,381,41,7",
"words": [
{
"boundingBox": "2875,381,41,7",
"text": "LEAR"
}
]
}
]
}
"boundingBox": "2304,2353,706,32",
"lines": [
{
"boundingBox": "2304,2353,706,32",
"words": [
{
"boundingBox": "2817,2353,193,32",
"text": "Incorporated."
}
]}]}]}
我修剪下來不少。你可以看到Person ID:12345被分成一個Person,ID :, 12345的部分。
我需要從人員ID中提取數字,但是目前我這樣做的方式是,如果數據輸出的變化,它只是將無法正常工作:
目前我在做這些方針的東西:
Dim _tmp1 = o1("regions")(0)("lines")(1)("words")(1)("text")
Dim _tmp2 = o1("regions")(0)("lines")(1)("words")(2)("text")
然後我進行一個簡單的檢查,看是否_tmp1 =「ID:」
有一個更好的方法來獲得修正值。我想過只是提取所有「文本」鍵,然後嘗試在Person ID上進行匹配:並在此之後抓取數據直到下一個空格,但是如果提取數字包含額外的空格,則該方法將失敗。
有一種方法來處理不能自動提取的項目,我只是試圖提高自動提取不會失敗的機會。