2017-05-06 53 views
1

濾除第一線I進口cvs文件到一個變量象下面這樣:如何從可變豬

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(','); 
下面

是第3行的輸出:

tmp = limit basketball_players 3; 
dump tmp 

("playerID","year","stint","tmID","lgID","GP","GS","minutes","points","oRebounds","dRebounds","rebounds","assists","steals","blocks","turnovers","PF","fgAttempted","fgMade","ftAttempted","ftMade","threeAttempted","threeMade","PostGP","PostGS","PostMinutes","PostPoints","PostoRebounds","PostdRebounds","PostRebounds","PostAssists","PostSteals","PostBlocks","PostTurnovers","PostPF","PostfgAttempted","PostfgMade","PostftAttempted","PostftMade","PostthreeAttempted","PostthreeMade","note") 
("abramjo01","1946","1","PIT","NBA","47","0","0","527","0","0","0","35","0","0","0","161","834","202","178","123","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",) 
("aubucch01","1946","1","DTF","NBA","30","0","0","65","0","0","0","20","0","0","0","46","91","23","35","19","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",) 

你可以看到第一行是表格的標題。我使用下面的命令來過濾出第一行,但它不起作用。

grunt> players_raw = filter basketball_players by $1 > 0; 
2017-05-06 11:03:36,389 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 6 time(s). 

當我轉儲的值爲players_raw它返回空。我如何從變量中篩選出第一行?

回答

0

使用RANK生成一個新列,該列將向數據集添加行號。使用該列過濾第一行。

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(','); 
ranked = rank basketball_players; 
basketball_players_without_header = Filter ranked by (rank_basketball_players > 1); 
DUMP basketball_players_without_header; 

另一種方式來做到這一點

basketball_players = load '/usr/data/basketball_players.csv' using PigStorage(','); 
basketball_players_without_header = Filter basketball_players by ($0 matches '.*playerID.*'); 
DUMP basketball_players_without_header;