檢測圖像和切片圖像中的單詞和圖形爲每個單詞或圖形1圖像

我正在構建一個Web應用程序，以幫助學生學習數學。檢測圖像和切片圖像中的單詞和圖形爲每個單詞或圖形1圖像

該應用需要顯示來自LaTex文件的Maths內容。這些Latex文件呈現（精美）爲PDF格式，我可以通過pdf2svg將其完整轉換爲svg格式。

的（SVG或PNG或者其他的圖像格式）圖像看起來是這樣的：

_______________________________________ 
|          | 
| 1. Word1 word2 word3 word4   | 
| a. Word5 word6 word7    | 
|          | 
| ///////////Graph1///////////  | 
|          | 
| b. Word8 word9 word10    | 
|          | 
| 2. Word11 word12 word13 word14  | 
|          | 
|_______________________________________|

真實的例子：

的Web應用程序的意圖是操縱和添加內容到這個，導致這樣的事情：

_______________________________________ 
|          | 
| 1. Word1 word2      | <-- New line break 
|_______________________________________| 
|          | 
| -> NewContent1      | 
|_______________________________________| 
|          | 
| word3 word4       | 
|_______________________________________| 
|          | 
| -> NewContent2      | 
|_______________________________________| 
|          | 
| a. Word5 word6 word7    | 
|_______________________________________| 
|          | 
| ///////////Graph1///////////  | 
|_______________________________________| 
|          | 
| -> NewContent3      | 
|_______________________________________| 
|          | 
| b. Word8 word9 word10    | 
|_______________________________________| 
|          | 
| 2. Word11 word12 word13 word14  | 
|_______________________________________|

例子：

大單圖片不能給我的靈活性做這種操作的。

但是，如果圖像文件被分解成更小的文件，其中包含單個單詞和單個圖形我可以做這些操作。

我想我需要做的是檢測空白的形象，和切片圖像分成多個子圖像，看起來像這樣：

_______________________________________ 
|   |  |  |   | 
| 1. Word1 | word2 | word3 | word4  | 
|__________|_______|_______|____________| 
|    |  |     | 
| a. Word5 | word6 | word7   | 
|_____________|_______|_________________| 
|          | 
| ///////////Graph1///////////  | 
|_______________________________________| 
|    |  |     | 
| b. Word8 | word9 | word10   | 
|_____________|_______|_________________| 
|   |  |  |   | 
| 2. Word11 | word12 | word13 | word14 | 
|___________|________|________|_________|

我正在尋找一種方式來做到這一點。您認爲是什麼路要走？

謝謝你的幫助！

來源

2017-08-19 enzolito

垂直和水平投影。首先將整個圖像分割成行，然後將每行分割成列。 –

謝謝丹。我明白了。你會用什麼工具進行垂直和水平投影？它可以自動化嗎？它可以檢測行和列嗎？ – enzolito

你所做的是基本上計算每行的平均強度（例如，使用'cv2.reduce'），用它來確定行之間的白色間隙，找出間隙的中點，用它們作爲切點來生成一組圖像， –

我會使用水平和垂直投影來首先將圖像分割成線，然後將每一行分割成更小的切片（例如單詞）。

首先將圖像轉換爲灰度，然後對其進行反轉，以便間隙包含零和任何文本/圖形都不爲零。

img = cv2.imread('article.png', cv2.IMREAD_COLOR) 
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
img_gray_inverted = 255 - img_gray

計算水平投影 - 意味着每行強度，使用cv2.reduce，並將其平坦化，以線性陣列。

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten()

現在找到所有連續間隙的行範圍。您可以使用this answer中提供的功能。

row_gaps = zero_runs(row_means)

最後計算出差距的中點，我們將用它來裁剪圖像。

row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1)/2

你最終是這樣的情況（差距是粉紅色，紅色的分割點）：

下一步將處理每個標識線。

bounding_boxes = [] 
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])): 
    line = img[start:end] 
    line_gray_inverted = img_gray_inverted[start:end]

計算垂直投影（每列的平均強度），找到差距和切點。此外，計算間隙大小，以便過濾出單個字母之間的小間隙。

column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten() 
column_gaps = zero_runs(column_means) 
column_gap_sizes = column_gaps[:,1] - column_gaps[:,0] 
column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1)/2

過濾分界點。

filtered_cutpoints = column_cutpoints[column_gap_sizes > 5]

並創建每個段的邊界框列表。

for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]): 
    bounding_boxes.append(((xstart, start), (xend, end)))

現在你最終像這樣（再次差距是粉紅色，紅色的分割點）：

現在你可以削減了圖像。我只是想象中的邊界框：

完整的腳本：

import cv2 
import numpy as np 
import matplotlib.pyplot as plt 
from matplotlib import gridspec 


def plot_horizontal_projection(file_name, img, projection): 
    fig = plt.figure(1, figsize=(12,16)) 
    gs = gridspec.GridSpec(1, 2, width_ratios=[3,1]) 

    ax = plt.subplot(gs[0]) 
    im = ax.imshow(img, interpolation='nearest', aspect='auto') 
    ax.grid(which='major', alpha=0.5) 

    ax = plt.subplot(gs[1]) 
    ax.plot(projection, np.arange(img.shape[0]), 'm') 
    ax.grid(which='major', alpha=0.5) 
    plt.xlim([0.0, 255.0]) 
    plt.ylim([-0.5, img.shape[0] - 0.5]) 
    ax.invert_yaxis() 

    fig.suptitle("FOO", fontsize=16) 
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97]) 

    fig.set_dpi(200) 

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi) 
    plt.clf() 

def plot_vertical_projection(file_name, img, projection): 
    fig = plt.figure(2, figsize=(12, 4)) 
    gs = gridspec.GridSpec(2, 1, height_ratios=[1,5]) 

    ax = plt.subplot(gs[0]) 
    im = ax.imshow(img, interpolation='nearest', aspect='auto') 
    ax.grid(which='major', alpha=0.5) 

    ax = plt.subplot(gs[1]) 
    ax.plot(np.arange(img.shape[1]), projection, 'm') 
    ax.grid(which='major', alpha=0.5) 
    plt.xlim([-0.5, img.shape[1] - 0.5]) 
    plt.ylim([0.0, 255.0]) 

    fig.suptitle("FOO", fontsize=16) 
    gs.tight_layout(fig, rect=[0, 0.03, 1, 0.97]) 

    fig.set_dpi(200) 

    fig.savefig(file_name, bbox_inches='tight', dpi=fig.dpi) 
    plt.clf() 

def visualize_hp(file_name, img, row_means, row_cutpoints): 
    row_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
    row_highlight[row_means == 0, :, :] = [255,191,191] 
    row_highlight[row_cutpoints, :, :] = [255,0,0] 
    plot_horizontal_projection(file_name, row_highlight, row_means) 

def visualize_vp(file_name, img, column_means, column_cutpoints): 
    col_highlight = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 
    col_highlight[:, column_means == 0, :] = [255,191,191] 
    col_highlight[:, column_cutpoints, :] = [255,0,0] 
    plot_vertical_projection(file_name, col_highlight, column_means) 


# From https://stackoverflow.com/a/24892274/3962537 
def zero_runs(a): 
    # Create an array that is 1 where a is 0, and pad each end with an extra 0. 
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0])) 
    absdiff = np.abs(np.diff(iszero)) 
    # Runs start and end where absdiff is 1. 
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2) 
    return ranges 


img = cv2.imread('article.png', cv2.IMREAD_COLOR) 
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
img_gray_inverted = 255 - img_gray 

row_means = cv2.reduce(img_gray_inverted, 1, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten() 
row_gaps = zero_runs(row_means) 
row_cutpoints = (row_gaps[:,0] + row_gaps[:,1] - 1)/2 

visualize_hp("article_hp.png", img, row_means, row_cutpoints) 

bounding_boxes = [] 
for n,(start,end) in enumerate(zip(row_cutpoints, row_cutpoints[1:])): 
    line = img[start:end] 
    line_gray_inverted = img_gray_inverted[start:end] 

    column_means = cv2.reduce(line_gray_inverted, 0, cv2.REDUCE_AVG, dtype=cv2.CV_32F).flatten() 
    column_gaps = zero_runs(column_means) 
    column_gap_sizes = column_gaps[:,1] - column_gaps[:,0] 
    column_cutpoints = (column_gaps[:,0] + column_gaps[:,1] - 1)/2 

    filtered_cutpoints = column_cutpoints[column_gap_sizes > 5] 

    for xstart,xend in zip(filtered_cutpoints, filtered_cutpoints[1:]): 
     bounding_boxes.append(((xstart, start), (xend, end))) 

    visualize_vp("article_vp_%02d.png" % n, line, column_means, filtered_cutpoints) 

result = img.copy() 

for bounding_box in bounding_boxes: 
    cv2.rectangle(result, bounding_box[0], bounding_box[1], (255,0,0), 2) 

cv2.imwrite("article_boxes.png", result)

來源

2017-08-19 16:50:40

謝謝Dan，這比我想象的要多！ – enzolito

OpenCV無法加載和寫入.svg文件，如果我理解正確嗎？它可以在任何規模下完美顯示。是否有OpenCV處理的矢量圖像格式？ – enzolito

據我所知，[它不能]（https://github.com/opencv/opencv/tree/master/modules/imgcodecs/src）當你想到它時，除非你渲染它，否則它不會是光柵圖像，所以這種方法可能需要不同（TBH，我需要做一些研究給你一個很好的答案）雖然一個pos可想而知，但它只是一個快速的想法 - 使用當前方法渲染並找到邊界框，然後使用座標找到相應的SVG片段。 –

圖像是高品質，完全乾淨，沒有歪斜，分開的字符。一個夢！

首先執行二值化和斑點檢測（OpenCV中的標準）。

然後通過在縱座標中將重疊的字符分組（即，在一行中面對彼此）對這些字符進行分組。這自然會隔離各條線路。

現在，在每一行中，按照從左到右的順序對塊進行排序，並按照鄰近程度進行聚類以隔離單詞。這將是一個微妙的步驟，因爲單詞內的字符間距接近不同單詞之間的間距。不要期望完美的結果。這應該比投影更好。

由於水平間距更窄，情況更糟，斜體。您可能還必須查看「傾斜距離」，即查找與斜體方向上的字符相切的線條。這可以通過應用逆剪切變換來實現。

多虧了網格，圖表將顯示爲大斑點。

來源

2017-08-19 15:52:53

謝謝Yves，我會研究這個 – enzolito

檢測圖像和切片圖像中的單詞和圖形爲每個單詞或圖形1圖像

回答

相關問題