3
我與豬合作,加載並用逗號分隔的文件/文件夾的Hadoop範圍內的多個文件(this question on how to load multiple files in pig豬 - 負載不同的模式
問題是,每個文件夾有不同的模式文件(位於從該文件夾的方) - 這可能也給多模式文件
我與豬合作,加載並用逗號分隔的文件/文件夾的Hadoop範圍內的多個文件(this question on how to load multiple files in pig豬 - 負載不同的模式
問題是,每個文件夾有不同的模式文件(位於從該文件夾的方) - 這可能也給多模式文件
如果你的模式文件所在的文件夾外,那麼你有當您執行負載申報模式
例如? :
dataset_A = LOAD '/data/A' using PigStorage('\t') as (id:int, project:chararray, org:chararray);
dataset_B = LOAD '/data/B' using PigStorage(',') as (id:int, beta:chararray, delta:chararray, echo:int);
如果您在目錄中的.pig_schema文件中有聲明的模式,則只需執行加載即可,無需聲明模式。
dataset_A = LOAD '/data/A' using PigStorage('\t');
dataset_B = LOAD '/data/B' using PigStorage(',');
/data/A/.pig_schema:
{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"project","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"org","type":55,"description":"autogenerated from Pig Field Schema","schema":null}],
"version":0,"sortKeys":[],"sortKeyOrders":[]}
/data/B/.pig_schema:
{"fields":
[{"name":"id","type":10,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"beta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"delta","type":55,"description":"autogenerated from Pig Field Schema","schema":null},
{"name":"echo","type":10,"description":"autogenerated from Pig Field Schema","schema":null},],
"version":0,"sortKeys":[],"sortKeyOrders":[]}