如何在PIG中導入/加載.csv文件？

-2

讓假設有限制的文本文件選項卡（datetemp.txt）我希望加載這個文本文件中豬進行處理，但是當我鍵入以下行其給我的錯誤是：如何在PIG中導入/加載.csv文件？

咕嚕> inputfile中=負載「 /training/pig/datetemp.txt'使用PigStorage（）As（EventID：chararray，eventdate：chararray，count：int）;

grunt> dump inputfile;

2014-09-06 08：41：23,527 [main] INFO org.apache.pig.tools.pigstats.ScriptState - 腳本中使用的Pig特徵：UNKNOWN 2014-09-06 08：41：23,544 [主] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - 文件連接閾值：100樂觀？ false 2014-09-06 08：41：23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - 優化前的MR計劃大小：1 2014-09-06 08：41：23,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - 優化後的MR計劃大小：1 2014-09-06 08：41：23,551 [main] INFO org.apache.pig.tools。 pigstats.ScriptState - 豬腳本設置被添加到作業中 2014-09-06 08：41：23,551 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset .buffer.percent未設置，設置爲默認值0.3 2014-09-06 08：41：23,552 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - 創建jar文件Job2739171785773930333.jar 2014-09-06 08:42： 39,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar文件Job2739171785773930333.jar創建者： 2014-09-06 08:42:39,612 [main] INFO org.apache.pig.backend。 hadoop.executionengine.mapReduceLayer.JobControlCompiler - 設置單店作業 2014-09-06 08：42：39,619 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job（ s）等待提交。 2014-09-06 08：42：39,630 WARN org.apache.hadoop.mapred.JobClient - 使用GenericOptionsParser解析參數。應用程序應該實現相同的工具。 2014-09-06 08：42：39,891 [線程-12] INFO org.apache.hadoop.mapred.JobClient - 清理臨時區域hdfs：//192.168.195.130：8020/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/training/.staging/job_201408292336_0009 2014-09-06 08：42：39,891 [線程-12] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as：training（auth：SIMPLE）原因：org.apache.pig.backend.executionengine.ExecException：錯誤2118：輸入路徑不存在：hdfs：//192.168.195.130：8020/training/pig/datetemp.txt 2014-09-06 08:42： 40,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0％complete 2014-09-06 08:42:40,125 [main] INFO org.apache.pig.backend.hadoop.executionengine .mapReduceLayer.MapReduceLauncher - 作業null失敗！停止運行所有相關的作業 2014年9月6日08：42：40125 [主] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100％完成 2014年9月6日08：42：40131 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - 錯誤2997：無法從後端錯誤重新創建異常：org.apache.pig.backend.executionengine.ExecException：錯誤2118：輸入路徑不存在：hdfs： //192.168.195.130:8020/training/pig/datetemp.txt at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits（PigInputFormat.java:285） at org.apache.hadoop.mapred .JobClient.writeNewSplits（JobClient.java:1014） at org.apache.hadoop.mapred.JobClient.writeSplits（JobClient.java:1031） at org.apache.hadoop.mapred.JobClient.access $ 600（JobClient.java： 172）在org.apache.hadoop.mapred.JobClient $ 2.run（JobClient.java:943）在org.apache.hadoop.mapred.JobClient $ 2.run（JobClient.java:896）在java.security。AccessController.doPrivileged（本地方法） at javax.security.auth.Subject.doAs（Subject.java:396） at org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation.java:1332） at org.apache .hadoop.mapred.JobClient.submitJobInternal（JobClient.java:896） at org.apache.hadoop.mapreduce.Job.submit（Job.java:531） at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob .submit（ControlledJob.java:318） at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs（JobControl.java:238） at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run （JobControl.java:269）在java.lang.Thread.run（Thread.java:662）在org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLaun雪兒$ 1.run（MapReduceLauncher.java:260）造成的：org.apache.hadoop.mapreduce.lib.input.InvalidInputException：輸入路徑不存在：HDFS：//192.168.195.130：8020 /培訓/頭/ datetemp .txt at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus（FileInputFormat.java:231） at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus（PigTextInputFormat.java：（PigInputFormat.java：0） 273） ... 15更多

2014-09-06 08:42:40,131 [main] ERROR org.apache.pig.to ols.pigstats.PigStatsUtil - 1個地圖減少工作失敗！ 2014年9月6日08：42：40135 [主] INFO org.apache.pig.tools.pigstats.SimplePigStats - 腳本統計：

HadoopVersion PigVersion用戶ID StartedAt FinishedAt特點 2.0.0 cdh4.1.1 0.10。 0-cdh4.1.1培訓2014年9月6日8點41分23秒2014年9月6日8時42分40秒未知

失敗！

失敗作業：的JobId別名功能消息輸出 N/A inputfile中MAP_ONLY消息：org.apache.pig.backend.executionengine.ExecException：ERROR 2118：輸入路徑不存在：HDFS：//192.168.195.130： 8020/training/pig/datetemp.txt at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits（PigInputFormat.java:285） at org.apache.hadoop.mapred.JobClient.writeNewSplits（JobClient的.java：1014）在org.apache.hadoop.mapred.JobClient.writeSplits（JobClient.java:1031）在org.apache.hadoop.mapred.JobClient.access $ 600（JobClient.java:172）的組織。 apache.hadoop.mapred.JobClient $ 2.run（JobClient.java:943）在org.apache.hadoop.mapred.JobClient $ 2.run（JobClient.java:896）在java.security.AccessController.doPrivileged（本機方法）在javax.security.auth.Subject.doAs（Subject.java： 396）在org.apache.hadoop.security.UserGroupInformation.doAs（UserGroupInformation.java:1332）在org.apache.hadoop.mapred.JobClient.submitJobInternal（JobClient.java:896）在org.apache.hadoop。 mapreduce.Job.submit（Job.java:531） at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit（ControlledJob.java:318） at org.apache.hadoop.mapreduce.lib.jobcontrol。 JobControl.startReadyJobs（JobControl.java:238） at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run（JobControl.java:269） at java.lang.Thread.run（Thread.ja va：662） at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher $ 1.run（MapReduceLauncher.java:260）引起：org.apache.hadoop.mapreduce.lib.input.InvalidInputException：Input路徑不存在：hdfs：//192.168.195.130：8020/training/pig/datetemp。txt at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus（FileInputFormat.java:231） at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus（PigTextInputFormat.java:36 ） at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits（FileInputFormat.java:248） at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits（PigInputFormat.java:273 ） ... 15個 HDFS：//192.168.195.130：8020/TMP/TEMP-1004538676/tmp1582688785，

輸入（S）：無法從「/training/pig/datetemp.txt讀取數據「

產出：無法產生導致「HDFS：//192.168.195.130：8020/TMP/TEMP-1004538676/tmp1582688785」

計數器：總記錄寫入：0 總字節寫入：0 濺灑內存管理器溢出次數：0 總包主動瀉：0 記錄合計主動瀉：0

工作DAG：空

2014年9月6日08：42：40135 [主] INFO組織.apache.pig.backend.hadoop.executionengine .mapReduceLayer.MapReduceLauncher - 失敗！ 2014-09-06 08：42：40,142 [main] ERROR org.apache.pig.tools.grunt.Grunt - 錯誤1066：無法打開別名輸入文件的迭代器日誌文件的詳細信息：/home/training/pig_1410006833865.log

請幫我這裏.. !!

來源

2014-09-01 Prix

對於在尋找[錯誤1066：無法打開迭代器別名]時發現此帖子的人（http://stackoverflow.com/questions/34495085/error-1066-unable-to- open-iterator-for-alias-in-pig-generic-solution）這裏是[通用解決方案]（http://stackoverflow.com/a/34495086/983722）。 – 2015-12-28 15:06:39

PigStorage區分大小寫。使用PigStorage而不是豬存儲。

來源

2014-09-01 04:06:10

@Prix，如果我的答案已解決您的問題，請將其標記爲已回答。 – 2014-09-03 05:49:09

它不工作...現在m得到低於錯誤 – Prix 2014-09-06 08:39:28

錯誤org.apache.pig.tools.grunt.Grunt - 錯誤1066：無法打開迭代器 – Prix 2014-09-06 08:39:47

您的問題頭條新聞表示您試圖加載CSV文件。爲此，我的LOAD陳述中的using org.apache.pig.piggybank.storage.CSVExcelStorage()祝你好運，如https://martin.atlassian.net/wiki/x/WYBmAQ所示。

來源

2015-04-16 21:03:40

你爲什麼不寫PigStorage（ '\ t'）如你所提到的已經你已經制表符分隔，而不是PigStorage（）

提到的代碼文件 -

grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As (EventID: chararray,eventdate: chararray,count:int);

可能是這可能會解決你的問題。

讓我知道它是否是別的。

來源

2015-06-17 19:01:32

hdfs://192.168.195.130:8020/training/pig/datetemp.txt

在您的hdfs中找不到文件掃描儀！確保輸入文件放置在上述位置。

來源

2015-06-18 04:04:07 karthik

您是否檢查過輸入路徑是否存在？

嘗試：

fs -ls /training/pig/ in Grunt Shell

如果它顯示datetemp.txt列表中，則它會工作，否則給

來源

2015-09-01 20:23:27 Naga

日誌清楚地告訴ERROR正確的輸入路徑。

org.apache.pig.backend.executionengine.ExecException：錯誤2118：輸入路徑不存在：HDFS：//192.168.195.130：8020 /培訓/頭/ datetemp.txt

你能檢查文件是否存在於HDFS中？你也可以檢查你的豬在mapreduce模式或本地模式下運行。

來源

2015-09-02 13:34:26 Narasimha

您可以在PigStorage Class中指定'，'來讀取CSV文件。

查詢看起來像：

grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage(',') As (EventID: chararray,eventdate: chararray,count:int); 

grunt> dump inputfile;

，並確保你在HDFS有文件 '/training/pig/datetemp.txt'。要測試運行：hadoop fs -ls /training/pig/datetemp.txt

來源

2015-09-24 15:00:58 pradeep

如何在PIG中導入/加載.csv文件？

回答

相關問題