非文字搜索能否從搜索引擎中受益？

我有一個搜索網站。運行MySQL數據庫。我想知道它是否會從搜索引擎（Sphinx，Lucene等）的性能中受益？如果它會如何？我可以利用分面搜索嗎？如果有文本搜索，我知道它會有好處。但是如果大多數查詢都像下面這樣，它會有好處嗎？非文字搜索能否從搜索引擎中受益？

select SQL_CALC_FOUND_ROWS distinct tableA.id 
    from tableA as A 
     join tableB as B1 on A.id=B1.tablea_id 
     join tableB as B2 on A.id=B2.tablea_id 
     join tableB as B3 on A.id=B3.tablea_id 
where 
    B1.value in ([list of ints here]) 
and 
    B2.value in ([another list of ints here]) 
and 
    B2.value in ([one more list of ints here]) 
order by ~A.updated_at 
limit <from>,<amount>;

的想法是尋找行的tableA有值tableB從第一個列表中，然後過濾，然後從第二個列表中離開那些值tableB等對它們進行排序，計算所有發現和限制。

tableA和tableB是這樣的：

create table tableA (
    id int(11) not null autoincrement, 
    ... 
    updated_at timestamp not null, 
    primary key (`id`), 
    key `ix_tablea_updated_at` (`updated_at`) 
) engine=InnoDB; 

create table tableB (
    tablea_id int(11) not null, 
    value int(11) not null, 
    key `ix_tableb_tablea_id` (`tablea_id`), 
    key `ix_tableb_value` (`value`) 
) engine=InnoDB;

tableA包含〜20萬行。 tableB包含~120M行。 B.value in ([list of ints])的數量因查詢而異，lists of ints也是如此。

如果我無法從搜索引擎中受益，我可以通過任何其他方式提高性能嗎？

據我可以說問題是order by ~A.updated_at和計數發現的行。有沒有一種方法可以加快使用MySQL本身的排序和計數？

PS。請原諒我的英語。希望你能理解我。

來源

2013-07-21 zaquest

你爲什麼要在同一個ID上加入三次表B？你可以得到同樣的效果有一個連接：

select SQL_CALC_FOUND_ROWS distinct tableA.id 
from tableA A join 
    tableB B 
    on A.id = B.tablea_id 
where B.value in ([list of ints here]) and 
     B.value in ([another list of ints here]) and 
     B.value in ([one more list of ints here]) 
order by A.updated_at 
limit <from>, <amount>;

有三個列表是多餘的，所以你也可以這樣做：

select SQL_CALC_FOUND_ROWS distinct tableA.id 
from tableA A join 
    tableB B 
    on A.id = B.tablea_id 
where B.value in ([big big combined list of ints here]) 
order by A.updated_at 
limit <from>, <amount>;

如果您有B(value)指數甚至B(value, tablea_id)那麼表現會甚至更好。

編輯：

不，您的查詢不按您認爲的方式工作。每次加入表格時，都會增加行數。假設A表中的QQQ值在B表中有10個相應的行。第一個連接獲得10行，第二個連接乘以100，第三個連接到1,000。這可能是您的性能問題的根源。

你只是在同一列進行連續過濾。其實，我懷疑你真的想知道所有這三個列表中的每一個都有一個B id。如果是這樣，那麼這是一個「設置中集」的查詢，並且容易使用group by做到：

select SQL_CALC_FOUND_ROWS tableA.id 
from tableA A join 
    tableB B 
    on A.id = B.tablea_id 
group by tableA.id 
having sum(B.value in ([list of ints here])) > 0 and 
     sum(B.value in ([another list of ints here])) > 0 and 
     sum(B.value in ([one more list of ints here])) > 0 
order by A.updated_at 
limit <from>, <amount>;

你原來的做法可能不工作 - 這是有趣的。它通常效率很低（除非其中一個值永遠不會出現在數據中，因此連接最終不會返回任何行）。

來源

2013-07-21 19:36:53

如果我只加入'tableB'一次，那麼'B.value'應該同時在所有3個列表中。（不是嗎？）。如果我多次加入，那麼我可以過濾A的B.values從第一個列表和第二個列表等，分開。我錯了嗎？它似乎按照我的說法工作。我有'ix_tableb_value'索引。 – zaquest

我不在同一列上過濾。如果我從B得到A（1,2,3）值的某一行，那麼加入B 3次會得到[（1,1,1），（1,1,2），（1,1,3）），（1,2,1），..，（3,3,3）]，然後我可以找到A，其中B1.value = 1，B2.value = 2，B3.value = 3。那是對的嗎？ – zaquest

使用'group by'會得到相同的結果，但執行時間要延長2倍。無論如何，謝謝你的嘗試。 – zaquest

非文字搜索能否從搜索引擎中受益？

回答

相關問題