2
目前,我想輸入我的數據,試圖機器學習的目的,該數據是像三列如下(第一是時間,第二個是代碼,第三個是號碼):如何用tinestamp等輸入CSV文件到mahout中以實現相似性功能等?
2016-06-05 00:00:00 fd04:bd3:80e8:2:215:8d00:35:ca4b 0
2016-06-05 00:00:00 fd04:bd3:80e8:2:215:8d00:35:f2be 0.12549
2016-06-05 00:00:00 fd04:bd3:80e8:2:215:8d00:35:c8a1 0.14091
2016-06-05 00:00:01 fd04:bd3:80e8:2:215:8d00:35:ca4b 0
2016-06-05 00:00:01 fd04:bd3:80e8:2:215:8d00:35:f2be 0.25098
2016-06-05 00:00:01 fd04:bd3:80e8:2:215:8d00:35:c8a1 0
2016-06-05 00:00:02 fd04:bd3:80e8:2:215:8d00:35:ca4b 0
2016-06-05 00:00:02 fd04:bd3:80e8:2:215:8d00:35:f2be 0.25098
以下是將數據導入到Mahout的代碼:
import java.util.List;
import java.io.File;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
/**
*/
public class RecommenderIntro {
public static void main(String[] args) throws Exception {
// TODO code application logic here
DataModel model = new FileDataModel (new File("/home/leo/csv_dump11.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity (model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood (2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender (model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
我如何能實現分類等功能?請告訴我。非常感謝你!
非常感謝,因爲我是JAVA新手,我不認爲我可以重新格式化數據。如果可能的話,你能告訴我如何擴展它並覆蓋它嗎? –
你可以使用任何東西來重新格式化它,python,excel,如果它很小,bash等。我假設數據集不是太大,文檔確實說'FileDataModel'不應該用於任何太大的東西。 –
但是我想保持原樣,因爲它是來自傳感器的電流的時間數據,如果您不介意請告訴我如何擴展它以使其有效導入原始數據? –