刪除內部邊框

我有很多來自表格圖像的裁剪圖像。 OCR由於表格邊框的「剩菜」而導致文本檢測出現問題。其實我正在尋找方法來刪除它們（我只想拿起文本）。以下是他們的一些例子：刪除內部邊框

first image example

second image example

謝謝！

來源

2017-05-05 sebbz

爲什麼不乾脆在每個邊界裁剪_x_像素的圖像（即5像素）？ –

因爲有些圖像沒有黑色邊框，有時候它們也很小，如果我像你說的那樣裁剪它們，我也會剪裁文字。 – sebbz

此代碼（基於opencv）解決了這兩個示例的問題。該過程如下：

閾值圖像
remove從二進制對象線
- 計算比=（物體的面積）/（邊界框的面積）
  - 如果比率太小，我們認爲物體是線條組合
  - 如果比例較大，我們認爲該物體是單線

這裏的Python代碼：

import cv2 
import matplotlib.pylab as plt 
import numpy as np 

# load image 
img = cv2.imread('om9gN.jpg',0) 

# blur and apply otsu threshold 
img = cv2.blur(img, (3,3)) 
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU) 

# invert image 
img = (img == 0).astype(np.uint8) 


img_new = np.zeros_like(img) 

# find contours 
_,contours,_ = cv2.findContours(img, 1, 2) 

for idx, cnt in enumerate(contours): 

    # get area of contour 
    temp = np.zeros_like(img) 
    cv2.drawContours(temp, contours , idx, 1, -1) 
    area_cnt = np.sum(temp) 

    # get number of pixels of bounding box of contour 
    x,y,w,h = cv2.boundingRect(cnt) 
    area_box = w * h 

    # get ratio of cnt-area and box-area 
    ratio = float(area_cnt)/area_box 

    # only draw contour if: 
    # - 1.) ratio is not too big (line fills whole bounding box) 
    # - 2.) ratio is not too small (combination of lines fill very 
    #         small ratio of bounding box) 
    if 0.9 > ratio > 0.2: 
     cv2.drawContours(img_new, contours , idx, 1, -1) 

plt.figure() 
plt.subplot(1,2,1) 
plt.imshow(img_new) 
plt.axis("off") 
plt.show()

來源

2017-05-05 15:35:41

好主意，應該在大多數情況下工作。但是，根據所使用的字體，它還會刪除諸如** l **（L），** I **（i）和** i **之類的字母。 –

這是真的。避免這種情況的一個選擇是也可以在邊界框縱橫比（box_width/box_lenght）的縱橫比上設置閾值。如果縱橫比太小，它必須是一行而不是I，l， - ，... –

，在這裏我找到了另一種方法來從二進制圖像中刪除行：http://docs.opencv.org/trunk /d1/dee/tutorial_moprh_lines_detection.html –

刪除內部邊框

回答

相關問題