2015-07-20 54 views
5
Class ProdsTransformer: 

    def __init__(self): 
     self.products_lookup_hmap = {} 
     self.broadcast_products_lookup_map = None 

    def create_broadcast_variables(self): 
     self.broadcast_products_lookup_map = sc.broadcast(self.products_lookup_hmap) 

    def create_lookup_maps(self): 
    // The code here builds the hashmap that maps Prod_ID to another space. 

pt = ProdsTransformer() 
pt.create_broadcast_variables() 

pairs = distinct_users_projected.map(lambda x: (x.user_id,  
         pt.broadcast_products_lookup_map.value[x.Prod_ID])) 

我收到以下錯誤引用SparkContext:星火:廣播變量:看來,你正試圖從廣播變量,動作,或transforamtion

"Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."

着如何對付任何幫助廣播變量會很棒!

+0

這些代碼和/或樣本數據不足以讓某人嘗試複製錯誤和/或修復錯誤。另外,如果你沒有注意到,所有的縮進都會被剝離出來。 – Paul

+0

我已添加更多代碼。 – user3803714

+0

我想知道如果將'products_lookup_map'移出ProdsTransformer'實例的屬性並將其變爲全局屬性,錯誤是否會消失。你需要多個地圖嗎? – Paul

回答

8

通過引用包含在你的map拉姆達的廣播變量中的對象,星火將嘗試序列化整個對象,並將其運送到工人。由於該對象包含對SparkContext的引用,因此會出現該錯誤。取而代之的是:

pairs = distinct_users_projected.map(lambda x: (x.user_id, pt.broadcast_products_lookup_map.value[x.Prod_ID])) 

嘗試這種情況:

bcast = pt.broadcast_products_lookup_map 
pairs = distinct_users_projected.map(lambda x: (x.user_id, bcast.value[x.Prod_ID])) 

後者避免了參照物體(pt),使得火花僅需要運送廣播變量。