2017-06-26 37 views
0

我是一個初學者在豬,並試圖瞭解元組數據類型,我有如下文件:豬使用的元組

 cat student.csv 
id,name,grade,contact_details 
s1234,Mohan,8,(Delhi,9811830) 
s2345,Nisha,10,(Delhi,257891) 
s3456,Anuj,12,(Delhi,9897212) 
s4567,vishal,14,(Delhi,989175) 

的詳細聯繫信息是由城市和電話的元組:

我在下面的關係加載它:

student = load 'student.csv' using PigStorage(',') as 
(id:chararray, 
    name:chararray, 
    grade:int, 
    contact: tuple(city:chararray,phone:chararray)); 

現在,當我試圖轉儲我沒有讓我的元組在輸出結果,下面是dump_student的輸出:

grunt> dump student; 
(s1234,Mukul,8,) 
(s2345,Nikita,10,) 
(s3456,Anuj,12,) 
(s4567,vishu,14,) 
grunt> 

grunt> describe student; 
student: {id: chararray,name: chararray,grade: int,contact: (city: chararray,phone: chararray)} 

我錯過了什麼?

回答

0

您使用的分隔符'''導致文件被錯誤加載,因爲','也出現在元組中。或者替換元組之間的',',或者只加載字段分成5個字段並替換'('和')'和concat以從城市和電話字段獲取contact_details。

選項1:使用 '' 作爲定界符

id name grade contact_details 

s1234 Mohan 8 (Delhi,9811830) 
s2345 Nisha 10 (Delhi,257891) 
s3456 Anuj 12 (Delhi,9897212) 
s4567 vishal 14 (Delhi,989175) 

student = load 'student.csv' using PigStorage(' ') as (id:chararray, name:chararray, grade:int, contact: tuple(city:chararray,phone:chararray)); 

選項2:與 '' 作爲定界符

id,name,grade,contact_details 

s1234,Mohan,8,(Delhi,9811830) 
s2345,Nisha,10,(Delhi,257891) 
s3456,Anuj,12,(Delhi,9897212) 
s4567,vishal,14,(Delhi,989175) 


student = load 'student.csv' using PigStorage(',') as (id:chararray, name:chararray, grade:int, city:chararray,phone:chararray); 
student_new = FOREACH A GENERATE id,name,grade,CONCAT(REPLACE(CONCAT(city,' '),'(',''),REPLACE(phone,')','')) AS contact_details;