2016-11-29 38 views

回答

0

你可以簡單地創建一個自定義提取或更多,導入數據作爲一個行再拆乾淨,並使用提供給您的C#方法U型SQL內的類似SplitIsNullOrWhiteSpace,這樣的事情是:

My right-aligned sample data

// Import the row as one column to be split later; NB use a delimiter that will NOT be in the import file 
@input = 
    EXTRACT rawString string 
    FROM "/input/input.txt" 
    USING Extractors.Text(delimiter : '|'); 


// Add a row number to the line and remove white space elements 
@working = 
    SELECT ROW_NUMBER() OVER() AS rn, new SqlArray<string>(rawString.Split(' ').Where(x => !String.IsNullOrWhiteSpace(x))) AS columns 
    FROM @input; 


// Prepare the output, referencing the column's position in the array 
@output = 
    SELECT rn, 
      columns[0] AS id, 
      columns[1] AS firstName, 
      columns[2] AS lastName 
    FROM @working; 


OUTPUT @output 
TO "/output/output.txt" 
USING Outputters.Tsv(quoting : false); 

我的結果:

My Results HTH

1

@ wBob的解決方案在你的行適合一個字符串(128kB)的情況下工作。否則,編寫你的自定義提取器,確保解壓縮。根據您對格式的信息,您可以使用input.Split()將其分割成若干行,然後根據您的空白規則拆分行(如下所示)(提取模式的完整示例爲here),或者可以編寫一個類似於在this blog post中描述的那個。

public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow outputrow) 
    { 
     foreach (Stream current in input.Split(this._row_delim)) 
     { 
      using (StreamReader streamReader = new StreamReader(current, this._encoding)) 
      { 
       int num = 0; 
       string[] array = streamReader.ReadToEnd().Split(new string[]{this._col_delim}, StringSplitOptions.None).Where(x => !String.IsNullOrWhiteSpace(x))); 
       for (int i = 0; i < array.Length; i++) 
       { 
        // Now write your code to convert array[i] into the extract schema 
       } 
      } 
      yield return outputrow.AsReadOnly(); 
     } 
    } 
} 
+0

偉大的補充,以及有關128KB限制的重要觀點,謝謝@MichaelRys。 – wBob

相關問題