LeaseExpiredException與Hive中的自定義UDF

我有一個Hive UDF，它應該從UA字符串中提取設備。它採用UA-解析庫： https://github.com/tobie/ua-parser LeaseExpiredException與Hive中的自定義UDF

的UDF是相當簡單：

public class DeviceTypeExtractTest extends UDF{ 
private Text result = new Text(); 
private static final Parser uaParser; 
    static { 
    try { 
     uaParser = new Parser(); 
    } 
    catch(IOException e) { 
     throw new RuntimeException("Could not instantiate User-Agent parser."); 
    } 
    } 

public Text evaluate(Text uaField){ 
    if (uaField == null) { 
     return null; 
    } 

    try 
    { 
     String uaString = uaField.toString(); 
     Client client = uaParser.parse(uaString); 
     result.set(client.device.family); 
     return result; 
    } 
    catch(Exception e) 
    { 
     return null; 
    } 
    } 
}

當上一個小數據集運行它工作得很好。

create table categories(
        cat string); 
insert overwrite table categories select DEVICE_TYPE_EXTRACT(user_agent) from raw_logs;

然而，在超過10萬行的更大的數據集測試這個時候，我得到的一切努力，這LeaseExpiredException： http://pastebin.com/yK6Qmx6r

我的地圖和降低工藝仍然停留在0％幾個小時。請注意，如果我拿出這個udf並使用一些內部Hive UDF來進行測試，那麼這種行爲不會發生。

我在AMI版本2.4.5（Hive 0.11.0.2和Hadoop 1.0.3）的Amazon EMR集羣上運行此操作。

我嘗試通過部署更好的硬件來提高集羣的性能，但是我遇到了與任何硬件情況相同的問題。

任何想法？

來源

2014-09-11 Ana Todor

好的，從頭開始。看起來，升級實例後，事情開始移動，但我只是沒有足夠長的時間來進行映射。而且，當我殺死進程時，因爲我很小，所以LeaseExpiredError實際上被拋出。

儘管如此，解析需要花費大量的時間，我希望進一步優化UDF。

來源

2014-09-11 12:48:26

LeaseExpiredException與Hive中的自定義UDF

回答

相關問題