2016-09-14 81 views
0

我已經寫了一個豬代碼,我想要匹配多個字符串的列。例如。錯誤2998:未處理的內部錯誤。 null - Apache Pig

A = FOREACH A1 GENERATE 
    c1, c2, c3, 

--i have substituted junk values-- 

case 
when ( (
     column_name matches '.*abc.*' 
    OR column_name matches '.*sdf.*' 
    OR column_name matches '.*bcd.*' 
    OR column_name MATCHES '.*def.*' 
    OR column_name MATCHES '.*efg.*' 
    OR column_name MATCHES '.*ggg.*' 
    OR column_name MATCHES '.*ghi.*' 
    OR column_name MATCHES '.*hij.*' 
    OR column_name MATCHES '.*ijk.*' 
    OR column_name MATCHES '.*jkl.*' 
    OR column_name MATCHES '.*klm.*' 
    OR column_name MATCHES '.*lmn.*' 
    or column_name matches '.*mno.*' 
    or column_name matches '.*mnb.*' 
    or column_name matches '.*opq.*' 
    or column_name matches '.*pqr.*' 
    or column_name matches '.*qrs.*' 
    or column_name matches '.*stuv.*' 
    or column_name matches '.*tuvw.*' 
    or column_name matches '.*wxy.*' 
    or column_name matches '.*tuvwx.*' 
    or column_name matches '.*xyz.*' 
    . 
    . 
    . 
    . 
    . 
    ) then 1 
      else 0 as c4; 

據觀察,當或列名匹配的數目「---」陳述超越672,豬腳本失敗與錯誤運行:

Pig Stack Trace 
--------------- 
ERROR 2998: Unhandled internal error. null 

java.lang.StackOverflowError 
     at java.util.zip.Deflater.ensureOpen(Deflater.java:543) 
     at java.util.zip.Deflater.deflate(Deflater.java:426) 
     at java.util.zip.Deflater.deflate(Deflater.java:352) 
     at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:251) 
     at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211) 
     at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) 
     at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840) 
     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 
     at java.util.ArrayList.writeObject(ArrayList.java:742) 
     at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 
     at java.util.ArrayList.writeObject(ArrayList.java:742) 
     at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) 
     at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) 
     at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) 
     at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 
     at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 
     at java.util.ArrayList.writeObject(ArrayList.java:742) 
     at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 

請建議滿足這一要求的解決方案或替代方案。

回答

0

您可能會考慮編寫一個自定義過濾器函數1,您可以更好地控制RAM消耗。很可能您不需要RegEx,而是使用子字符串搜索。

+0

UPD:你生成,而不是過濾器,所以它應該是一個評估函數https://pig.apache.org/docs/r0.16.0/udf.html#eval-functions – patrungel

+0

所以,基本上我需要寫一個UDF並在所需的一組值中搜索列值(子字符串),如('abc | def | ghi | jkl | mno')。 這是對@patrungel的正確理解嗎? – Suyog

+0

正確編寫UDF,但它不適用於在列表中搜索列值;至少我有這樣的印象,你在尋找模式_in_值。 – patrungel

相關問題