2011-10-25 78 views
3

我有一個表中的下列數據中,列名標題:SQL提取數據

Acqua Di Parma Blu Mediterraneo Arancia Di Capri Scented Water EDT 
Acqua Di Parma Blu Mediterraneo Arancia 
Acqua Di Parma Blu Mediterraneo Bergamotto Di Calabria 
Acqua Di Parma Blu Mediterraneo Cipresso Di Toscana Scented Water EDT 
Acqua di Parma Blu Mediterraneo fico di amalfi 
Acqua Di Parma Blu Mediterraneo Fico di Amalfi Scented Water EDT 
Acqua Di Parma Blu Mediterraneo Mirto di Panarea 
Acqua Di Parma Blu Mediterraneo Mirto di Panarea Scented Water EDT 
Acqua Di Parma Blu Meditteraneo Cipresso 
Acqua Di Parma Colonia Assoluta Bath 
Acqua Di Parma Colonia Assoluta 
Acqua Di Parma Colonia Body Cream 
Acqua Di Parma Colonia Body Cream Tube 
Adidas Deep Energy 
Adidas Dynamic Pulse 
Adidas Fair Play 

正如你可以看到這些都是帕爾瑪之水藍色地中海的所有變化和阿迪達斯產品

有沒有一種方法來讀取數據,信信,那麼當一個字母不會出現以上說的3倍,這封信改變之前返回的是

基本上,我想讀這個列表僅返回

Acqua Di Parma Blu Meditteraneo 
Acqua Di Parma Colonia 
Adidas Deep Energy 
Adidas Dynamic Pulse 
Adidas Fair Play 

整個表大約有70,000行所有相似的數據。

該表由row_id,標題,類別

可能嗎?

非常感謝

達倫

+0

你有沒有訪問一些腳本語言,或者這是否需要在「純」的SQL? – Jens

+1

此外,爲什麼你的結果是由所有三個「阿迪達斯」線而不是一個? – Jens

+0

@Jens PHP - 你說得對 - 我需要重新思考如何提出這個問題,而不是很好地解釋它 –

回答

1

OK - 這個心不是漂亮,不知道這是完全正確的,但它是最接近我能得到。

我創建包含每一組子串的這樣

create table subs as 
select title, 
     substring_index(title, ' ',1) one, 
     substring_index(title, ' ',2) two, 
     substring_index(title, ' ',3) three, 
     substring_index(title, ' ',4) four, 
     substring_index(title, ' ',5) five, 
     substring_index(title, ' ',6) six, 
     substring_index(title, ' ',7) seven 
    from title; 

一個單獨的表,然後創建的查詢來檢查是否一組由一列的是大於1(即,不是唯一的)和組由的再下一列是= 1(即唯一的)和前一列是下一個的子字符串,那麼就聯合在一起每對列的結果,終於做在整個很多

select distinct brand from (
    select * from 
    (select one brand, count(*) bcount 
    from subs 
    group by one) one, 
    (select two prod, count(*) pcount 
    from subs 
    group by two) two 
    where bcount > 1 
    and pcount=1 
    and locate(one.brand, two.prod)>0 
    union all 
    select * from 
    (select two brand, count(*) bcount 
    from subs 
    group by two) two, 
    (select three prod, count(*) pcount 
    from subs 
    group by three) three 
    where two.bcount > 1 
    and three.pcount=1 
    and locate(two.brand, three.prod)>0 
    union all 
    select * from 
    (select three brand, count(*) bcount 
    from subs 
    group by three) three, 
    (select four prod, count(*) pcount 
    from subs 
    group by four) four 
    where three.bcount > 1 
    and four.pcount=1 
    and locate(three.brand, four.prod)>0 
    union all 
    select * from 
    (select four brand, count(*) bcount 
    from subs 
    group by four) four, 
    (select five prod, count(*) pcount 
    from subs 
    group by five) five 
    where four.bcount > 1 
    and five.pcount=1 
    and locate(four.brand, five.prod)>0 
    union all 
    select * from 
    (select five brand, count(*) bcount 
    from subs 
    group by five) five, 
    (select six prod, count(*) pcount 
    from subs 
    group by six) six 
    where five.bcount > 1 
    and six.pcount=1 
    and locate(five.brand, six.prod)>0 
    union all 
    select * from 
    (select six brand, count(*) bcount 
    from subs 
    group by six) six, 
    (select seven prod, count(*) pcount 
    from subs 
    group by seven) seven 
    where six.bcount > 1 
    and seven.pcount=1 
    and locate(six.brand, seven.prod)>0) x 
一個SELECT DISTINCT

這導致以下

enter image description here

但它仍然有一些問題,因爲它同時顯示帕爾瑪之水藍光和帕爾瑪之水Medit ..在兩行而不是隻有一次,所以它不是正確的。

+0

猜測刪除重複項的唯一方法是在這個新的集合上再次重複。然後,一旦你有獨特的設置,任何以此爲主要字符串的產品就是該品牌。希望能幫助到你。 –