0
在進行市場購物籃分析,提取規則之後,...我也想計算項目的常見出現 - 作爲元組 - 以便在Tableau中對其進行可視化。您可以在下面找到每個ID /籃子成員的項目。PySpark計數常見事件
df = sqlContext.createDataFrame([("ID_1", "Butter"),
("ID_1", "Toast"),
("ID_1","Ham"),
("ID_2", "Ham"),
("ID_2", "Toast"),
("ID_2","Egg"),],
["ID","VAL"])
df.show()
+----+------+
| ID| VAL|
+----+------+
|ID_1|Butter|
|ID_1| Toast|
|ID_1| Ham|
|ID_2| Ham|
|ID_2| Toast|
|ID_2| Egg|
+----+------+
這是我想達到的效果:
res = sqlContext.createDataFrame([("Butter", "Butter", 0),
("Butter", "Toast", 1),
("Butter", "Ham", 1),
("Butter", "Egg", 0),
("Toast", "Toast", 0),
("Toast", "Ham", 2),
("Toast", "Egg", 1),
("Ham", "Ham", 0),
("Ham", "Egg", 0),
("Egg", "Egg", 0),],
["VAL_1","VAL_2", "COUNT"])
res.show()
+------+------+-----+
| VAL_1| VAL_2|COUNT|
+------+------+-----+
|Butter|Butter| 0|
|Butter| Toast| 1|
|Butter| Ham| 1|
|Butter| Egg| 0|
| Toast| Toast| 0|
| Toast| Ham| 2|
| Toast| Egg| 1|
| Ham| Ham| 0|
| Ham| Egg| 0|
| Egg| Egg| 0|
+------+------+-----+
你的幫助是非常感謝!謝謝!