我正在嘗試執行文檔聚類。輸入格式是一個帶有各種鍵和值的字符串和數字類型的JSON字符串。根據存在的鍵的類型和值,我應該能夠使用它自己的相似類型對文檔進行聚類。使用機器學習JSON文檔集羣
例如:JSON文檔: {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Jeans"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Shirt"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Jeans"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Jeans"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Top"}, {"title":0, "Bname":"Brand1", "weight":"100", "type":"Top"}, {"title":0, "Bname":"Lee", "height":"2864", "type":"refrigerator"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Top"}, {"title":0, "Time":"Casio", "Price":"2000", "type":"watch"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Top"}, {"title":0, "brand":"Levis", "length":"28,30,32,34,36", "type":"Shirt"}
基礎上,匹配參數,我想羣集文件。
我想知道的方法和可能的java機器學習庫來執行此操作。
到目前爲止,我已經理解Kmeans,DBSCAN在集羣中,但我不知道如何將JSON字符串減少到向量以及如何對此結果執行聚類。