2015-10-29 35 views
1

我構建了一個簡單的問答系統。SQL:如何查詢一組數據並計算給定字符串列表中匹配的字符串數量

在我的數據庫中,有三個表:

question (
    id   int 
    question varchar(200) 
    answer_id int /* foreign key mapping to answer.id */ 
); 

answer (
    id int 
    answer varchar(500) 
) 

question_elements (
    id int 
    seq int /*vocabulary in question location */ 
    question_id int /** foreign key mapping to question.id */ 
    vocabulary varchar(40) 
) 

現在我有一個問題:

What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ? 
表問題

因此,記錄是:

question { 
    id: 1, 
    question:"What approach should a company adopt when its debt ratio is higher than 50% and wanna continue to get funding ?", 
    answer_id:1 
} 

在表question_elements

question_elements [ 
    { 
    id: 1, 
    seq: 1, 
    question_id: 1, 
    vocabulary: "what" 
    }, 
    { 
    id: 2, 
    seq: 2, 
    question_id: 1, 
    vocabulary: "approach" 
    }, 
    { 
    id: 3, 
    seq: 3, 
    question_id: 1, 
    vocabulary: "should" 
    }, 
    { 
    id: 4, 
    seq: 4, 
    question_id: 1, 
    vocabulary: "a" 
    }, 
    { 
    id: 5, 
    seq: 5, 
    question_id: 1, 
    vocabulary: "company" 
    }, 
    { 
    id: 6, 
    seq: 6, 
    question_id: 1, 
    vocabulary: "adopt" 
    }, 
    { 
    id: 7, 
    seq: 7, 
    question_id: 1, 
    vocabulary: "when" 
    }, 
    .... 
    .... 
    { 
    id: 19, 
    seq: 19, 
    question_id: 1, 
    vocabulary: "get" 
    }, 
    { 
    id: 20, 
    seq: 20, 
    question_id: 1, 
    vocabulary: "funding" 
    } 
] 

現在,當用戶輸入:

What action does a company should do when it wanna get more funding with high debt ratio 

我的想法是爲了通過給上述計算表question_elements匹配的字符串上面的語句分割成一個字符串列表,並執行一個SQL查詢字符串列表。

什麼是PostgreSQL中的SQL語句?

+0

你是使用json字段還是你向我們顯示數據的方式? –

+0

看起來你有兩個問題。一個是用'「」執行分割,另一個是查看有多少匹配。 –

+0

[在Postgres中將列拆分成多行]可能的副本(http://stackoverflow.com/questions/29419993/split-column-into-multiple-rows-in-postgres) –

回答

0

如果我沒有理解好了,你想是這樣的:

WITH answer AS (
    SELECT 
     'What action does a company should do when it wanna get more funding' AS a 
), 
question AS (
    SELECT 'what' AS q 
    UNION ALL SELECT 'should' 
    UNION ALL SELECT 'a' 
    UNION ALL SELECT 'company' 
    UNION ALL SELECT 'do' 
    UNION ALL SELECT 'when' 
) 
SELECT COUNT(result) 
FROM (
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result 
    FROM answer 
) AS tbaux 
WHERE result IN (select CAST(q AS VARCHAR) FROM question); 

沒有文字大寫和一些解釋:

SELECT COUNT(result) 
FROM (            --count how many lines have in the subquery 
    SELECT unnest(string_to_array(CAST(a AS VARCHAR),' ')) AS result  --this break the user input in one word per line, excluding ' ' 
    FROM answer 
) AS tbaux                 --name of the subquery 
WHERE upper(result) IN (select upper(CAST(q AS VARCHAR)) FROM question); --upper turns lowercase letters in uppercase, only the line who match will remain to the COUNT() 

這個統計有多少的話,從用戶輸入的問題表(在你的情況下question_elements

http://sqlfiddle.com/#!15/9eecb7db59d16c80417c72d1e1f4fbf1/4095/0

0

question_elements表是沒有必要的。

with ui(ui) as (
    values ('What action does a company should do when it wanna get more funding with high debt ratio') 
) 
select id, count(*) as matches, question 
from 
    (
     select id, question, regexp_split_to_table(question, '\s+') as word 
     from question 
    ) q 
    inner join 
    regexp_split_to_table((select ui from ui), '\s+') ui(word) using (word) 
group by 1, 3 
order by matches desc 
相關問題