2
不同的元組的計數頻率我有了看起來像JSON項的文件:豬:在文件
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin"}
我要計算的文件在不同的JSON對象的頻率。我看到了其他答案,我們在Pig中使用Group By和Count()函數。我不確定我是否正確使用它們,但我沒有得到所需的結果。我的輸出應該如下所示:
{"child_pos": "NN", "parent_pos": "NN", "parent": "fighter", "child_dep": "nn", "parent_dep": "nsubj", "child": "virtua", "count": "3"}
{"child_pos": "NN", "parent_pos": "NN", "parent": "case", "child_dep": "nn", "parent_dep": "nsubj", "child": "martin", "count": "2"}
順序並不重要。有人可以給我一些指點嗎?
請分享你已經嘗試了什麼爲什麼你認爲這不起作用? – Mzf