2012-10-10 208 views
4

我有一堆我想要清理的URL。它們都包含UTM參數,在這種情況下這些參數不是必需的,或者更有害。例如:如何從URL中刪除Google跟蹤參數(UTM)?

http://houseofbuttons.tumblr.com/post/22326009438?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+HouseOfButtons+%28House+of+Buttons%29 

所有潛在參數都以utm_開頭。 如何使用ruby腳本/結構輕鬆刪除它們而不破壞其他潛在的「良好」URL參數?

回答

8

你可以申請一個正則表達式到URL來清理它們。這樣的事情應該做的伎倆:

url = 'http://houseofbuttons.tumblr.com/post/22326009438?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+HouseOfButtons+%28House+of+Buttons%29&normal_param=1' 
url.gsub(/&?utm_.+?(&|$)/, '') => "http://houseofbuttons.tumblr.com/post/22326009438?normal_param=1" 
9

它使用URI lib解構和更改查詢字符串(無正則表達式):

require 'uri' 
str ='http://houseofbuttons.tumblr.com/post/22326009438?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+HouseOfButtons+%28House+of+Buttons%29&normal_param=1' 

uri = URI.parse(str) 
clean_key_vals = URI.decode_www_form(uri.query).reject{|k, _| k.start_with?('utm_')} 
uri.query = URI.encode_www_form(clean_key_vals) 
p uri.to_s #=> "http://houseofbuttons.tumblr.com/post/22326009438?normal_param=1" 
+1

+1採取的設計作業工具的優勢。 –

+0

@steenslag礦是一個小小的獨立腳本。它給了我'未定義的方法'decode_www_form'爲URI:Module(NoMethodError)'。可能是什麼原因? – myhd

+0

@myhd看起來像在Ruby 1.9.2中添加了方法 – steenslag