2013-08-28 12 views
1

我希望能夠使用Java構建的模型,我能夠與CLI也這樣做如下因素:在Java代碼中使用象夫,不是cli來

./mahout trainlogistic --input Candy-Crush.twtr.csv \ 
     --output ./model \ 
     --target hd_click --categories 2 \ 
     --predictors click_frequency country_code ctr  device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count \ 
     --types  numeric   word   numeric word    word   word  numeric word  word word  numeric  \ 
     --features 100 --passes 1 --rate 50 

我無法理解的20個新聞組的例子,因爲其大到可以借鑑。 任何人都可以給我一個和cli命令一樣的代碼嗎?

澄清:

我需要的東西是這樣的:

model.train(1,0,"monday",6,44,1,7,4,6,78,7,3,4,6,........,"good"); 
    model.train(1,0,"sunday",6,44,5,7,9,2,4,6,78,7,3,4,6,........,"bad"); 
    model.train(1,0,"monday",4,99,2,4,6,3,4,6,........,"good"); 

    model.writeTofile("myModel.model"); 

普萊舍不回答如果你不與分類熟悉,只是想告訴我如何執行CLI命令從Java

+0

我不明白你的問題。 _i想要構建的是什麼意思能夠構建模型_? 你說你可以用CLI做這件事,但在最後一句話中,你要求代碼做一些cli命令。 –

+0

是的...我需要代碼來做到這一點。 – Dima

+0

但你告訴使用你能夠做到這一點。那麼爲什麼我們應該寫你的代碼? –

回答

-2

您可以使用Runtime.exec從java執行相同的cmd行。

簡單的方法是:

Process p = Runtime.getRuntime().exec("/usr/bin/bash -ic \"<path_to_mahout>/mahout trainlogistic --input Candy-Crush.twtr.csv " + "--output ./model " + "--target hd_click --categories 2 " + "--predictors click_frequency country_code ctr device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count " + "--types numeric word numeric word word word numeric word word word numeric " + "--features 100 --passes 1 --rate 50\"");

如果你選擇這個,那麼我建議讀這第一: When Runtime.exec() won't

這樣的應用程序將在diffent進程中運行。

另外,你可以按照章節「集成應用程序」從以下站點: Recomender Documentation

而且這是在寫一個recomender一個很好的參考: Introducing Apache Mahout

希望這有助於。 乾杯

+0

我知道如何使用java運行時。這不是我要求的。 – Dima

+0

@DimaGoltsman對不起,只是想幫忙。我以爲你想讓它自動化,而不是手動運行它。你沒有提到你不允許創建新的進程。無論如何,僅僅因爲你的問題並不清楚,並不意味着我的回答是錯誤的。 – Claudiu

6

我不是100%熟悉亨利馬烏API(我同意,文檔是很稀疏的),所以我只能指點,但我希望它能幫助:

trainlogistic示例中的Java源代碼實際上可以在mahout-examples庫中找到 - 它位於maven [0](位於org.apache.mahout.classifier.sgd.TrainLogistic)。我想如果你想,你可以使用完全相同的源代碼,但它取決於mahout-examples庫中的幾個實用程序類(它也不是很乾淨)。

執行在這個例子中,訓練的類是org.apache.mahout.classifier.sgd.OnlineLogisticRegression [1],雖然考慮到大量預測變量的您有可能要使用的AdaptiveLogisticRegression [2](相同的封裝),它使用了許多OnlineLogisticRegression小號內部。但是你必須親眼看到哪一個最適合你的數據。

的API是相當簡單的,有一個train方法,它需要你輸入數據的Vectorclassify方法來測試你的模型,以及learningRate和別人改變模型的參數。

要像存儲命令行工具一樣將模型保存到磁盤,請使用org.apache.mahout.classifier.sgd.ModelSerializer,它具有直觀的API來編寫和讀取模型。(還有writereadFields方法在OLR類本身,但坦率地說,我不知道他們做了什麼或者是否有到ModelSerializer一個區別 - 他們沒有任何記載。)

最後,除了從源頭代碼mahout-examples,這裏有兩個直接使用Mahout API的例子,這可能是有用的[3,4]。

來源:

[0] http://repo1.maven.org/maven2/org/apache/mahout/mahout-examples/0.8/

[1] http://archive.cloudera.com/cdh4/cdh/4/mahout/mahout-core/org/apache/mahout/classifier/sgd/OnlineLogisticRegression.html

[2] http://archive.cloudera.com/cdh4/cdh/4/mahout/mahout-core/org/apache/mahout/classifier/sgd/AdaptiveLogisticRegression.html

[3] http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%[email protected].com%3E

[4] http://skife.org/mahout/2013/02/14/first_steps_with_mahout.html