2013-03-29 103 views
0

我有這個簡短的python腳本:豬udf中的python代碼的正確輸入/輸出?

import langid 
import sys 

for pig_tuple in sys.stdin: 
    cols = pig_tuple.split() 

    if len(cols) < 2: 
     sys.exit(0) 

    try: 
     id = int(cols[0]) 
     text = " ".join(cols[1:]) 
    except: 
     sys.exit(0) 

    (lang,prob) = langid.classify(text) 
    print "%s\t%s" %(id,lang) 

sys.exit(0) 

我想運行一個腳本豬裏面。我想:

define langid_cmd `python2.6 /data/test/compiled_python/langid_command_line.py` ship('/data/test/compiled_python/langid_command_line.py'); 

text = LOAD '$PIG_INPUT' USING PigStorage() as (text:chararray); 

pythonDetect1 = STREAM text through langid_cmd AS (pid:chararray,planguage:chararray); 

,但我得到:

2013-03-29 15:53:22,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 
2013-03-29 15:53:22,303 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple 
Details at logfile: /home/isl/ryan/src/main/pigScripts/pig_1364597410350.log 
2013-03-29 15:53:22,306 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple 
Details at logfile: /home/isl/ryan/src/main/pigScripts//src/main/pigScripts/pig_1364597410350.log 
2013-03-29 15:53:22,308 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple 
Details at logfile: /home/isl/ryan/src/main/pigScripts//src/main/pigScripts/pig_1364597410350.log 
2013-03-29 15:53:22,311 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple 
Details at logfile: /home/isl/ryan/src/main/pigScripts/src/main/pigScripts/pig_1364597410350.log 
2013-03-29 15:53:22,313 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal error. java.lang.String cannot be cast to org.apache.pig.data.Tuple 
Details at logfile: /home/isl/ryan/src/main/pigScripts/src/main/pigScripts/pig_1364597410350.log 

目錄/數據/測試/ compiled_python被chmod'd爲777,當我從shell運行此:

-bash-3.2$ echo 14353 I can haz pigscriptz? | python /data/test/compiled_python/langid_command_line.py 
14353 eu 

??

回答

0

AS (pid:chararray,planguage:chararray)告訴豬希望輸出是一個字符串元組,但您返回製表符分隔的字符串。您應該返回打印出來的結果

print "(%s,%s)" %(id,lang) 

or use the python UDF integration