複雜的蜂巢查詢

您好我有如下表：複雜的蜂巢查詢

ID------ |--- time 
====================== 
5------- | ----200101 
3--------| --- 200102 
2--------|---- 200103 
12 ------|---- 200101 
16-------|---- 200103 
18-------|---- 200106

現在我想知道某個月在今年出現的頻率。我不能使用一個組，因爲這隻能計算出現在表中的次數。但是，我也希望在某年的某個月沒有出現時獲得0。所以輸出應該是這樣的：

time-------|----count 
===================== 
200101--|--  2 

200102--|--  1 

200103--|--  1 

200104--|--  0 

200105--|--  0 

200106--|--  1

對不起，表格格式不好，我希望它仍然清楚我的意思。我會apreciate任何幫助

來源

2013-07-03 user2523848

您可以提供包含所有年份和月份信息的年份表。我寫了一個腳本，讓你產生這樣的csv文件：在year_month.sh

#!/bin/bash 

# year_month.sh 

start_year=1970 
end_year=2015 

for year in $(seq ${start_year} ${end_year}); do 
    for month in $(seq 1 12); do 
     echo ${year}$(echo ${month} | awk '{printf("%02d\n", $1)}'); 
    done; 
done > year_month.csv

保存並運行它。然後您將得到一個文件year_month.csv，其中包含1970年至2015年的年份和月份。您可以更改start_year和end_year以指定年份範圍。

然後，上傳year_month.csv文件到HDFS。例如，

hadoop fs -mkdir /user/joe/year_month 
hadoop fs -put year_month.csv /user/joe/year_month/

之後，您可以將year_month.csv加載到Hive中。例如，

create external table if not exists 
year_month (time int) 
location '/user/joe/year_month';

最後，您可以加入新的表與表中，以得到最終結果。例如，假設你的表是id_time：

from (select year_month.time as time, time_count.id as id 
     from year_month 
     left outer join id_time 
     on year_month.time = id_time.time) temp 
select time, count(id) as count 
group by time;

注意：你需要做細小的改動（如道路，類型）上面的說法。

來源

2013-07-04 03:13:49 zsxwing

複雜的蜂巢查詢

回答

相關問題