2013-01-31 69 views
0

我想知道是否可以根據條件組合列值。讓我來解釋......Hive根據條件組合列值

讓說我的數據看起來像這樣

Id name offset 
1 Jan 100 
2 Janssen 104 
3 Klaas 150 
4 Jan 160 
5 Janssen 164 

的我的輸出應該是這樣的

Id fullname offsets 
1 Jan Janssen [ 100, 160 ] 

我想的名字值在兩行合併的地方兩行的偏移不再相隔1個字符。

我的問題是,如果這種類型的數據操作是可能的,是否有人可以共享一些代碼和解釋?

請溫柔,但是這小小的一段代碼返回這片一些我想要的東西是什麼?

ArrayList<String> persons = new ArrayList<String>(); 

    // write your code here 
    String _previous = ""; 

    //Sample output form entities.txt 
    //USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660 
    //USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685 
    File file = new File("entities.txt"); 

    try { 
     // 
     // Create a new Scanner object which will read the data 
     // from the file passed in. To check if there are more 
     // line to read from it we check by calling the 
     // scanner.hasNextLine() method. We then read line one 
     // by one till all line is read. 
     // 
     Scanner scanner = new Scanner(file); 
     while (scanner.hasNextLine()) { 

      if(_previous == "" || _previous == null) 
       _previous = scanner.nextLine(); 

      String _current = scanner.nextLine(); 
      //Compare the lines, if there offset is = 1 
      int x = Integer.parseInt(_previous.split(",")[3]) + Integer.parseInt(_previous.split(",")[4]); 
      int y = Integer.parseInt(_current.split(",")[4]); 
      if(y-x == 1){ 
       persons.add(_previous.split(",")[1] + " " + _current.split(",")[1]); 
       if(scanner.hasNextLine()){ 
        _current = scanner.nextLine(); 
       } 
      }else{ 
       persons.add(_previous.split(",")[1]); 
      } 
      _previous = _current; 
     } 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 

    for(String person : persons){ 
     System.out.println(person); 
    } 

工作樣本數據

USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Richard,PERSON,7,2732 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2740 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,2756 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3093 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,3195 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,3220 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,10660 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,10685 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,10858 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,11063 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Ken,PERSON,3,11186 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Marottoli,PERSON,9,11234 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Berkowitz,PERSON,9,17073 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Lea,PERSON,3,17095 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Stephanie,PERSON,9,17330 
USER.A-GovDocs-f83c6ca3-9585-4c66-b9b0-f4c3bd57ccf4,Putt,PERSON,4,17340 

其中產生這樣的輸出

Richard Marottoli 
Marottoli 
Marottoli 
Marottoli 
Berkowitz 
Berkowitz 
Marottoli 
Lea 
Lea 
Ken 
Marottoli 
Berkowitz 
Lea 
Stephanie Putt 

親切的問候

+0

我對輸出如何派生有點困惑,但我認爲這是非常類似於[這個問題](http://stackoverflow.com/questions/14028796/reduce-a-set-of-rows -hive-to-another-set-of-rows)我在配置單元中使用自定義映射/減少來回答。你只需要提供適當的reduce腳本。 – libjack

+0

我用一段java代碼編寫我的問題,樣本數據和輸出。我想將我的java代碼轉換爲配置單元代碼。任何想法,如果這是可能的? – Tinuz

+0

抱歉,您的其他代碼仍未明確說明您要完成的任務。較新的代碼/數據看起來像要將表加載到配置單元並提取列(這很可能),而前者以某種方式組合行。 – libjack

回答

1

使用下裝載表創建見下表查詢

drop table if exists default.stack; 
create external table default.stack 
(junk string, 
    name string, 
cat string, 
len int, 
off int 
) 
ROW FORMAT DELIMITED 
FIELDS terminated by ',' 
STORED AS INPUTFORMAT             
    'org.apache.hadoop.mapred.TextInputFormat'       
OUTPUTFORMAT               
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
location 'hdfs://nameservice1/....'; 

使用,讓您所需的輸出。

select max(name), off from (
select CASE when b.name is not null then 
      concat(b.name," ",a.name) 
      else 
      a.name 
     end as name 
     ,Case WHEN b.off1 is not null 
      then b.off1 
      else a.off 
     end as off 
from default.stack a 
left outer join (select name 
         ,len+off+ 1 as off 
         ,off as off1 
       from default.stack) b 
on a.off = b.off) a 
group by off 
order by off; 

我已經測試過它產生了你想要的結果。