2016-03-01 50 views
-6

代碼:123.rb

require 'rubygems' 
require 'mechanize' 

agent = Mechanize.new 
page = agent.get('https://www.similarweb.com/website/mail.ru') 

puts weekly = JSON.parse(page.body.match(/Sw.preloadedData = (.*);\n/)[1]) 

結果爲:

[email protected]:~/rocketsience/simweb/app/controllers$ ruby 123.rb 
{ 
    "overview" => { 
    "TotalLastMonthVisits" => 1828491727, 
    "Country" => 643, 
    "AdNetworks" => { 
     "Count" => 6, "Data" => [ 
     ["sh.st", "", 0.41199451429508266, 0], 
     ["Direct/ADVERT", "", 0.27447485705506364, 0], 
     ["Adf.ly", "", 0.12228322532873495, 0], 
     ["Adcash", "", 0.03801283480321146, 0], 
     ["Wi Get Media", "", 0.023698808366319796, 0], 
     ["Other", "", 0.12953576015158788, 0] 
     ] 
    }, 
    "TrafficSources" => { 
     "Search" => 0.1297336800471612, 
     "Social" => 0.06889915318430725, 
     "Mail" => 0.005584724601318816, 
     "Paid Referrals" => 0.004138570612794292, 
     "Direct" => 0.5943352221920588, 
     "Referrals" => 0.19730864936235964, 
     "Appstore Internals" => 0.0 
    }, 
    "WeeklyTrafficNumbers" => { 
     "2015-08-01" => 1822718149, 
     "2015-09-01" => 1827876529, 
     "2015-10-01" => 1891870292, 
     "2015-11-01" => 1864311156, 
     "2015-12-01" => 1858168664, 
     "2016-01-01" => 1828491727 
    }, 
    "IsVerifiedData" => false, 
    "icon" => "https://site-images.similarcdn.com/image?url=mail.ru&t=2&s=1&h=5289608460719658395", 
    "TopCountryShares" => [ 
     [643.0, 0.6053812049559651], 
     [804.0, 0.1192692037909182], 
     [398.0, 0.07218054976996176], 
     [112.0, 0.049286365478780576], 
     [276.0, 0.021733483951015802] 
    ], 
    "RedirectUrl" => "Mail.ru", 
    "Category" => "Internet_and_Telecom/Email", 
    "GlobalRank" => [13, 0, -1, 0], 
    "CategoryRank" => [2, 0, 0, 0], 
    "Engagments" => [ 
     { 
     "Month" => 1, 
     "Year" => 2016, 
     "Visits" => [1828491727.8068438], 
     "Time" => [490.75107831736796], 
     "PPV" => [6.446836508199864], 
     "Bounce" => [0.3665403434426484] 
     } 
    ], 
    "CountryRanks" => { 
     "643" => [4, 0, 0, 0] 
    }, 
    "Referrals" => { 
     "destination" => [ 
     { 
      "Site" => "ok.ru", "Value" => 0.8810696028190225 
     }, 
     { 
      "Site" => "vk.com", "Value" => 0.018907885286900482 
     }, 
     { 
      "Site" => "youtube.com", "Value" => 0.00951820300173552 
     }, 
     { 
      "Site" => "yandex.ru", "Value" => 0.0073123507688652506 
     }, 
     { 
      "Site" => "instagram.com", "Value" => 0.006770291457912379 
     } 
     ], 
     "referrals" => [ 
     { 
      "Site" => "searchtds.ru", "Value" => 0.09210155557927252 
     }, 
     { 
      "Site" => "smartinf.ru", "Value" => 0.047463117719884956 
     }, 
     { 
      "Site" => "yandex.ru", "Value" => 0.044547048264450495 
     }, 
     { 
      "Site" => "4pda.ru", "Value" => 0.031459568013025206 
     }, 
     { 
      "Site" => "wotsite.net", "Value" => 0.023849799612281394 
     } 
     ] 
    } 
    } 
} 

及成果,我必須得到:

{ 
    "WeeklyTrafficNumbers" => 
    { 
     "2015-08-01" => 1822718149, 
     "2015-09-01" => 1827876529, 
     "2015-10-01" => 1891870292, 
     "2015-11-01" => 1864311156, 
     "2015-12-01" => 1858168664, 
     "2016-01-01" => 1828491727 
    } 
} 

並放入SCV。JSON如何解析只需一鍵 - 中的散列值

怎麼辦?請幫我...

+0

假設WeeklyTrafficNumbers是一個在返回的JSON中,你只需要把它作爲你的行: puts weekly = JSON.parse(page.body.match(/Sw.preloadedData =(。*); \ n /)[''WeeklyTrafficNumbers '] – ABrowne

+0

你的例子不工作 – user3426145

+0

沒有看到完整的JSON轉儲它不可能告訴你什麼是關鍵然而,作爲一個例子,是它是如何完成的: json = {'h':9,'g ':{'y':5}}。如果你想得到h,你可以鍵入json ['h'],但是如果你想要的話,你會輸入json ['g'] ['y'],因爲它的兩個級別在g – ABrowne

回答

0

這應該工作:

require 'csv' 
require 'json' 

weekly = JSON.parse(page.body.match(/Sw.preloadedData = (.*);\n/)[1]['overview']['WeeklyTrafficNumbers'] 
weekly = { 'WeeklyTrafficNumbers' => weekly } 


CSV.open("your_csv.csv", "w") do |csv| #open new file for write 
    weekly.each do |hash| #open json to parse 
    csv << hash.values #write value to file 
    end 
end 
+0

puts weekly = JSON.parse(page.body.match(/Sw.preloadedData =(。*); \ n /)[1])['overview'] ['TrafficSources'] ['WeeklyTrafficNumbers'] 這是什麼都不顯示。 我運行並在控制檯顯示一個空字符串。多數民衆贊成在所有=( – user3426145

+0

更新答案,並添加了CSV功能(我猜你是指CSV而不是SCV) – SickLickWill

0

這是你想要的數據的正確路徑:

irb> weekly = JSON.parse(page.body.match(/Sw.preloadedData = (.*);\n/)[1])['overview']['WeeklyTrafficNumbers'] 
=> {"2015-08-01"=>1822718149, "2015-09-01"=>1827876529, "2015-10-01"=>1891870292, "2015-11-01"=>1864311156, "2015-12-01"=>1858168664, "2016-01-01"=>1828491727} 

它的WeeklyTrafficNumbers的實際價值了,所以錯過了包裝:

weekly = { 'WeeklyTrafficNumbers' => weekly }

+0

是的,你是對的包裝。 – SickLickWill