如何在Apache Pig中加入地圖？（存儲在HBase中）

我有一個Apache的豬問題，不知道如何解決它，或者如果可能的話。我使用hbase作爲「存儲層」。該表如下所示：如何在Apache Pig中加入地圖？（存儲在HBase中）

row key/column (b1, c1)  (b2, c2) ...  (bn, cn) 
a1    empty   empty    empty 
a2    ... 
an    ...

有行鍵a1，並且每行都有與語法（bn，cn）不同的列。每行/列的值都是空的。

我的豬PROGRAMM看起來是這樣的：

/* Loading the data */ 
mydata = load 'hbase://mytable' ... as (a:chararray, b_c:map[]); 

/* finding the right elements */ 
sub1 = FILTER mydata BY a == 'a1'; 
sub2 = FILTER mydata BY a == 'a2');

現在我想加入SUB1和SUB2，這意味着我想找到存在於兩個數據SUB1和SUB2列。我怎樣才能做到這一點？

來源

2013-08-05 t k

地圖在純豬中無法做到這一點。因此你將需要一個UDF。我不確定你想要作爲連接的輸出，但是根據你的需要調整python UDF應該相當容易。

myudf.py

@outputSchema('cols: {(col:chararray)}') 
def join_maps(M1, M2): 
    # This literally returns all column names that exist in both maps. 
    out = [] 
    for k,v in M1.iteritems(): 
     if k in M2 and v is not None and M2[k] is not None: 
      out.append(k) 
    return out

你可以用它喜歡：

register 'myudf.py' using jython as myudf ; 

# We can call sub2 from in sub1 since it only has one row 
D = FOREACH sub1 GENERATE myudf.join_maps(b_c, sub2.b_c) ;

來源

2013-08-13 15:54:08 mr2ert

我的感謝。我用Java構建了一個MapToBag UDF，這對我很有用。謝謝！ –

如何在Apache Pig中加入地圖？ （存儲在HBase中）

回答

相關問題

如何在Apache Pig中加入地圖？（存儲在HBase中）