OCR相對於其他值提取JSON值？

我正在使用Azure Computer Vision API從圖像中提取文本。在這個特殊的用例中，我試圖提取一些圖像中的文本，看起來像這個「Person ID：########」，其中＃是一個數字人物編號。OCR相對於其他值提取JSON值？

這裏是JSON的樣品從API返回：

{"language": "en", 
"textAngle": 0.0, 
"orientation": "Up", 
"regions": [ 
    { 
    "boundingBox": "212,169,1384,359", 
    "lines": [ 
    { 
     "boundingBox": "228,169,281,36", 
     "words": [ 
     { 
      "boundingBox": "228,169,141,28", 
      "text": "Output" 
     }, 
     { 
      "boundingBox": "386,169,123,36", 
      "text": "Report" 
     } 
     ] 
    }, 
    { 
     "boundingBox": "212,279,287,25", 
     "words": [ 
     { 
      "boundingBox": "212,280,116,24", 
      "text": "Person" 
     }, 
     { 
      "boundingBox": "341,279,42,25", 
      "text": "ID:" 
     }, 
     { 
      "boundingBox": "408,279,91,25", 
      "text": "15060" 
     } 
     ] 
    }, 
    { 
     "boundingBox": "279,326,104,25", 
     "words": [ 
     { 
      "boundingBox": "279,326,104,25", 
      "text": "Notes:" 
     } 
     ] 
    } 
    ] 
}, 
    "boundingBox": "2436,172,159,32", 
    "lines": [ 
    { 
     "boundingBox": "2436,172,159,32", 
     "words": [ 
     { 
      "boundingBox": "2436,172,159,32", 
      "text": "Operator:" 
     } 
     ] 
    } 
    ] 
}, 
{ 
    "boundingBox": "2627,172,290,216", 
    "lines": [ 
    { 
     "boundingBox": "2627,172,103,32", 
     "words": [ 
     { 
      "boundingBox": "2627,172,103,32", 
      "text": "Output" 
     } 
     ] 
    }, 
     "boundingBox": "2629,329,288,37", 
     "words": [ 
     { 
      "boundingBox": "2683,329,234,37", 
      "text": "xm" 
     } 
     ] 
    }, 
    { 
     "boundingBox": "2875,381,41,7", 
     "words": [ 
     { 
      "boundingBox": "2875,381,41,7", 
      "text": "LEAR" 
     } 
     ] 
    } 
    ] 
} 
    "boundingBox": "2304,2353,706,32", 
    "lines": [ 
    { 
     "boundingBox": "2304,2353,706,32", 
     "words": [ 
     { 
      "boundingBox": "2817,2353,193,32", 
      "text": "Incorporated." 
     } 
     ]}]}]}

我修剪下來不少。你可以看到Person ID：12345被分成一個Person，ID :, 12345的部分。

我需要從人員ID中提取數字，但是目前我這樣做的方式是，如果數據輸出的變化，它只是將無法正常工作：

目前我在做這些方針的東西：

Dim _tmp1 = o1("regions")(0)("lines")(1)("words")(1)("text") 
Dim _tmp2 = o1("regions")(0)("lines")(1)("words")(2)("text")

然後我進行一個簡單的檢查，看是否_tmp1 =「ID：」

有一個更好的方法來獲得修正值。我想過只是提取所有「文本」鍵，然後嘗試在Person ID上進行匹配：並在此之後抓取數據直到下一個空格，但是如果提取數字包含額外的空格，則該方法將失敗。

有一種方法來處理不能自動提取的項目，我只是試圖提高自動提取不會失敗的機會。

來源

2017-08-08 DDulla

是的，有一個更簡單的方法。您可以使用Newtonsoft.JSON快速輕鬆地將JSON數據序列化到一個類中。例如，下面的JSON：

這裏有一個通用的例子：

首先，定義在你在你的數據查看條款對象

public class Account 
{ 
    public string Email { get; set; } 
    public bool Active { get; set; } 
    public DateTime CreatedDate { get; set; } 
    public IList<string> Roles { get; set; } 
}

然後，把你的JSON和調用JsonConvert.DeserializeObject<T>上它。

string json = @"{ 
    'Email': '[email protected]', 
    'Active': true, 
    'CreatedDate': '2013-01-20T00:00:00Z', 
    'Roles': [ 
    'User', 
    'Admin' 
    ] 
}"; 

Account account = JsonConvert.DeserializeObject<Account>(json); 

Console.WriteLine(account.Email); 
// [email protected]

您最終將得到一個對象，其屬性可以從JSON中無縫填充。

我覺得用這個安裝最簡單NuGet package.

來源

2017-08-09 14:10:44 Ares

OCR相對於其他值提取JSON值？

回答

相關問題