級聯TextDelimited日誌文件

我遵循Cascading的指南在其網站上。我有以下TSV格式輸入：級聯TextDelimited日誌文件

doc_id text 
doc01 A rain shadow is a dry area on the lee back side of a mountainous area. 
doc02 This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover. 
doc03 A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain. 
doc04 This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley. 
doc05 Two Women. Secrets. A Broken Land. [DVD Australia]

我使用下面的代碼來處理它：

Tap docTap = new Hfs(new TextDelimited(true, "\t"), inPath); 
... 
Fields token = new Fields("token"); 
Fields text = new Fields("text"); 
RegexSplitGenerator splitter = new RegexSplitGenerator(token, "[ \\[\\]\\(\\),.]"); 
// only returns "token" 
Pipe docPipe = new Each("token", text, splitter, Fields.RESULTS);

它看起來像只分割每條線的所述第二部分（忽略DOC_ID一部分）。 Cascading如何忽略第一個doc_id部分並僅處理第二部分？是因爲TextDelimited？

來源

2013-11-20 user2597504

如果你看到管道聲明

Pipe docPipe = new Each("token", text, splitter, Fields.RESULTS);

第二個參數是要發送到分離器功能的唯一領域。在這裏你正在發送'文字'字段。所以只有文本被髮送到分離器並返回令牌。

下面說明各個方法的清楚。

Each

@ConstructorProperties(value={"name","argumentSelector","function","outputSelector"}) 
public Each(String name, 
            Fields argumentSelector, 
            Function function, 
            Fields outputSelector) 

Only pass argumentFields to the given function, only return fields selected by the outputSelector. 

Parameters: 
    name - name for this branch of Pipes 
    argumentSelector - field selector that selects Function arguments from the input Tuple 
    function - Function to be applied to each input Tuple 
    outputSelector - field selector that selects the output Tuple from the input and Function results Tuples

來源

2013-11-29 10:02:49 Naveen

答案是在這兩條線

1.點擊的創建方法，程序被告知，第一行包含標題（「真」）。

Tap docTap = new Hfs(new TextDelimited(true, "\t"), docPath);

2.第二，在此行中列名的「文本」提供。如果仔細查看輸入文件，「文本」就是您試圖根據自己的字數據而設置的數據的列名。

Fields text = new Fields("text");

來源

2016-03-21 19:53:24 Unit1

級聯TextDelimited日誌文件

回答

相關問題