您不需經過使用itertools product
尋找每一個可能的屬性對夫婦,然後在此匹配行:
import pandas as pd
from itertools import product
# 1) creating pandas dataframe
df = [ ["1234" , "blond"],
["1235" , "brunette"],
["1236" , "blond" ],
["1234" , "tall"],
["1235" , "tall"],
["1236" , "short"]]
df = pd.DataFrame(df)
df.columns = ["id", "attribute"]
#2) creating all the possible attributes binomes
attributs = set(df.attribute)
for attribut1, attribut2 in product(attributs, attributs):
if attribut1!=attribut2:
#3) selecting the rows for each attribut
df1 = df[df.attribute == attribut1]["id"]
df2 = df[df.attribute == attribut2]["id"]
#4) finding the ids that are matching both attributs
intersection= len(set(df1).intersection(set(df2)))
if intersection:
#5) displaying the number of matches
print attribut1, attribut2, intersection
捐贈:
tall brunette 1
tall blond 1
brunette tall 1
blond tall 1
blond short 1
short blond 1
編輯
它是那麼容易細化到得到你想要的輸出:
import pandas as pd
from itertools import product
# 1) creating pandas dataframe
df = [ ["1234" , "blond"],
["1235" , "brunette"],
["1236" , "blond" ],
["1234" , "tall"],
["1235" , "tall"],
["1236" , "short"]]
df = pd.DataFrame(df)
df.columns = ["id", "attribute"]
wanted_attribute_1 = ["blond", "brunette"]
#2) creating all the possible attributes binomes
attributs = set(df.attribute)
for attribut1, attribut2 in product(attributs, attributs):
if attribut1 in wanted_attribute_1 and attribut2 not in wanted_attribute_1:
if attribut1!=attribut2:
#3) selecting the rows for each attribut
df1 = df[df.attribute == attribut1]["id"]
df2 = df[df.attribute == attribut2]["id"]
#4) finding the ids that are matching both attributs
intersection= len(set(df1).intersection(set(df2)))
#5) displaying the number of matches
print attribut1, attribut2, intersection
捐贈:
brunette tall 1
brunette short 0
blond tall 1
blond short 1
我認爲你的第一步應該是把它變成更好的關係秩序。這些屬性沒有邏輯分成頭髮顏色/高度屬性 – brianpck
確實!我試了一個答案,但不能做出這些區別 –