2014-02-20 76 views
2

我正在使用Pig 0.12.0和Hadoop 2.2.0。我已經在本地和地圖縮減模式下成功地從咕嚕貝和豬批處理腳本運行豬。現在我正嘗試從Java的嵌入式豬中運行豬。無法在MapReduce模式下使用Java運行嵌入式Pig

就這樣說,我也成功地在本地模式下運行嵌入式豬。但是,我遇到了在地圖縮減模式下運行嵌入式豬的問題。

的問題是:成功編譯類,什麼都沒有發生,當我運行

java -cp <classpath> PigMapRedMode 

我後來看到有人說我應該包括在類路徑pig.properties。如

fs.default.name=hdfs://<namenode-hostname>:<port> 
    mapred.job.tracker=<jobtracker-hostname>:<port> 

但是,在Hadoop 2.2.0中,JobTracker不再存在。任何想法該怎麼辦?

我附加了我的PigMapRedMode的Java代碼,以防萬一這裏出現問題。

import java.io.IOException; 
import org.apache.pig.PigServer; 

public class PigMapRedMode { 
    public static void main(String[] arg){ 
     try { 
      PigServer pigServer = new PigServer("map reduce, (need to add properties file)"); 
      runIdQuery(pigServer, "5pts.txt"); 
     } catch (Exception e){ 
     } 
    } 

    public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException { 
     pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');"); 
     pigServer.registerQuery("B = foreach A generate $0 as id;"); 
     pigServer.store("B", "id.out"); 
    } 
} 

更新:

解決方案已被發現!實際上,不需要在類路徑中提供Properties對象或使用pig.properties,只需要在類路徑中包含Hadoop配置目錄:(對於我的Hadoop 2.2.0,它是/ etc/hadoop)和可以從該位置檢索df.default.address和yarn.resourcemanager.address。

我附下面的修飾的Java代碼:

/** 
* Created by allenlin on 2/19/14. 
*/ 
import java.io.IOException; 
import java.util.Properties; 

import org.apache.pig.ExecType; 
import org.apache.pig.PigServer; 


public class PigMapRedMode { 
    public static void main(String[] arg){ 
     try { 
      PigServer pigServer = new PigServer(ExecType.MAPREDUCE); 
      runIdQuery(pigServer, "<hdfs input address>"); 
     } catch (Exception e){ 
     } 
    } 

    public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException { 
     pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');"); 
     pigServer.registerQuery("B = foreach A generate $0 as id;"); 
     pigServer.store("B", "<hdfs output address>"); 
    } 
} 

,我使用運行java類UNIX命令。小心你需要包括的依賴:

java -cp ".:$PIG_HOME/build/pig-0.12.1-SNAPSHOT.jar:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/mapreduce/*:antlr-runtime-3.4.jar:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/hdfs/*:$PIG_HOME/build/ivy/lib/Pig/*:$HADOOP_CONF_DIR" PigMapRedMode 

感謝@zsxwing幫助!

+1

這是我如何創建一個屬性。 'Properties properties = new Properties(); PropertiesUtil.loadDefaultProperties(屬性); properties.putAll(ConfigurationUtil.toProperties(conf));'你可以試試嗎? – zsxwing

+0

謝謝!我會試一試,讓你知道! –

+0

Hi @zsxwing感謝您的回覆,但有一個問題,如何獲取Hadoop配置文件? –

回答

0

這裏是我跑怎麼嵌入式豬

public class test1 { 
public static void main(String[] args) { 
try { 
     PigServer pigServer = new PigServer(ExecType.MAPREDUCE); 
     runQuery(pigServer); 
     Properties props = new Properties(); 
     props.setProperty("fs.default.name", "hdfs://localhost:9000"); 
}catch(Exception e) { 
     e.printStackTrace(); 
    } 
} 
public static void runQuery(PigServer pigServer) { 
    try { 
     pigServer.registerQuery("input1 = LOAD '/input.data' as (line:chararray);"); 
     pigServer.registerQuery("words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;"); 
     pigServer.registerQuery("word_groups = group words by word;"); 
     pigServer.registerQuery("word_count = foreach word_groups generate group, COUNT(words);"); 
     pigServer.registerQuery("ordered_word_count = order word_count by group desc;"); 
     pigServer.registerQuery("store ordered_word_count into '/wct';"); 
    } catch(Exception e) { 
     e.printStackTrace(); 
    } 

    } 
} 

集HADOOP_HOME在eclipse

Run Configurations-->ClassPath-->User Entries-->Advanced-->Add ClassPath Variables-->New-->Name(HADOOP_HOME)-->Path(You Hadoop directory path) 

Maven依賴我已經添加

<dependencies> 
    <dependency> 
    <groupId>org.apache.hadoop</groupId> 
    <artifactId>hadoop-hdfs</artifactId> 
    <version>2.7.1</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.hadoop</groupId> 
    <artifactId>hadoop-client</artifactId> 
    <version>2.7.1</version> 
</dependency> 

<dependency> 
    <groupId>commons-io</groupId> 
    <artifactId>commons-io</artifactId> 
    <version>2.4</version> 
</dependency> 
<dependency> 
    <groupId>log4j</groupId> 
    <artifactId>log4j</artifactId> 
    <version>1.2.16</version> 
</dependency> 
<dependency> 
    <groupId>org.apache.pig</groupId> 
    <artifactId>pig</artifactId> 
    <version>0.15.0</version> 
</dependency> 

<dependency> 
    <groupId>org.antlr</groupId> 
    <artifactId>antlr-runtime</artifactId> 
    <version>3.4</version> 
</dependency> 
</dependencies> 

如果不設置HADOOP_HOME正確,你會得到以下錯誤

hadoop20.PigJobControl: falling back to default JobControl (not using hadoop 0.20 ?) 
相關問題