2017-08-08 35 views
1

我正在使用Azure Computer Vision API從圖像中提取文本。在這個特殊的用例中,我試圖提取一些圖像中的文本,看起來像這個「Person ID:########」,其中#是一個數字人物編號。OCR相對於其他值提取JSON值?

這裏是JSON的樣品從API返回:

{"language": "en", 
"textAngle": 0.0, 
"orientation": "Up", 
"regions": [ 
    { 
    "boundingBox": "212,169,1384,359", 
    "lines": [ 
    { 
     "boundingBox": "228,169,281,36", 
     "words": [ 
     { 
      "boundingBox": "228,169,141,28", 
      "text": "Output" 
     }, 
     { 
      "boundingBox": "386,169,123,36", 
      "text": "Report" 
     } 
     ] 
    }, 
    { 
     "boundingBox": "212,279,287,25", 
     "words": [ 
     { 
      "boundingBox": "212,280,116,24", 
      "text": "Person" 
     }, 
     { 
      "boundingBox": "341,279,42,25", 
      "text": "ID:" 
     }, 
     { 
      "boundingBox": "408,279,91,25", 
      "text": "15060" 
     } 
     ] 
    }, 
    { 
     "boundingBox": "279,326,104,25", 
     "words": [ 
     { 
      "boundingBox": "279,326,104,25", 
      "text": "Notes:" 
     } 
     ] 
    } 
    ] 
}, 
    "boundingBox": "2436,172,159,32", 
    "lines": [ 
    { 
     "boundingBox": "2436,172,159,32", 
     "words": [ 
     { 
      "boundingBox": "2436,172,159,32", 
      "text": "Operator:" 
     } 
     ] 
    } 
    ] 
}, 
{ 
    "boundingBox": "2627,172,290,216", 
    "lines": [ 
    { 
     "boundingBox": "2627,172,103,32", 
     "words": [ 
     { 
      "boundingBox": "2627,172,103,32", 
      "text": "Output" 
     } 
     ] 
    }, 
     "boundingBox": "2629,329,288,37", 
     "words": [ 
     { 
      "boundingBox": "2683,329,234,37", 
      "text": "xm" 
     } 
     ] 
    }, 
    { 
     "boundingBox": "2875,381,41,7", 
     "words": [ 
     { 
      "boundingBox": "2875,381,41,7", 
      "text": "LEAR" 
     } 
     ] 
    } 
    ] 
} 
    "boundingBox": "2304,2353,706,32", 
    "lines": [ 
    { 
     "boundingBox": "2304,2353,706,32", 
     "words": [ 
     { 
      "boundingBox": "2817,2353,193,32", 
      "text": "Incorporated." 
     } 
     ]}]}]} 

我修剪下來不少。你可以看到Person ID:12345被分成一個Person,ID :, 12345的部分。

我需要從人員ID中提取數字,但是目前我這樣做的方式是,如果數據輸出的變化,它只是將無法正常工作:

目前我在做這些方針的東西:

Dim _tmp1 = o1("regions")(0)("lines")(1)("words")(1)("text") 
Dim _tmp2 = o1("regions")(0)("lines")(1)("words")(2)("text") 

然後我進行一個簡單的檢查,看是否_tmp1 =「ID:」

有一個更好的方法來獲得修正值。我想過只是提取所有「文本」鍵,然後嘗試在Person ID上進行匹配:並在此之後抓取數據直到下一個空格,但是如果提取數字包含額外的空格,則該方法將失敗。

有一種方法來處理不能自動提取的項目,我只是試圖提高自動提取不會失敗的機會。

回答

0

是的,有一個更簡單的方法。您可以使用Newtonsoft.JSON快速輕鬆地將JSON數據序列化到一個類中。例如,下面的JSON:

這裏有一個通用的例子:

首先,定義在你在你的數據查看條款對象

public class Account 
{ 
    public string Email { get; set; } 
    public bool Active { get; set; } 
    public DateTime CreatedDate { get; set; } 
    public IList<string> Roles { get; set; } 
} 

然後,把你的JSON和調用JsonConvert.DeserializeObject<T>上它。

string json = @"{ 
    'Email': '[email protected]', 
    'Active': true, 
    'CreatedDate': '2013-01-20T00:00:00Z', 
    'Roles': [ 
    'User', 
    'Admin' 
    ] 
}"; 

Account account = JsonConvert.DeserializeObject<Account>(json); 

Console.WriteLine(account.Email); 
// [email protected] 

您最終將得到一個對象,其屬性可以從JSON中無縫填充。

我覺得用這個安裝最簡單NuGet package.