如何指定輸入時使用CSV與庫爾

我想提供CSV file到Kur，但我不知道如何在輸入中指定多個列而沒有程序崩潰。這裏有一個小例子：如何指定輸入時使用CSV與庫爾

model: 
    - input: 
     - SepalWidthCm 
     - SepalLengthCm 
    - dense: 10 
    - activation: tanh 
    - dense: 3 
    - activation: tanh 
    name: Species 

train: 
    data: 
    - csv: 
     path: Iris.csv 
     header: yes 
    epochs: 1000 
    weights: best.w 
    log: tutorial-log 

loss: 
    - target: Species 
    name: mean_squared_error

錯誤：

File "/Users/bytter/.pyenv/versions/3.5.2/bin/kur", line 11, in <module> 
    sys.exit(main()) 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/__main__.py", line 269, in main 
    sys.exit(args.func(args) or 0) 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/__main__.py", line 48, in train 
    func = spec.get_training_function() 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/kurfile.py", line 282, in get_training_function 
    model = self.get_model(provider) 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/kurfile.py", line 148, in get_model 
    self.model.build() 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/model/model.py", line 282, in build 
    self.build_graph(input_nodes, output_nodes, network) 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/model/model.py", line 356, in build_graph 
    for layer in node.container.build(self): 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/containers/container.py", line 281, in build 
    self._built = list(self._build(model)) 
    File "/Users/bytter/.pyenv/versions/3.5.2/lib/python3.5/site-packages/kur/containers/layers/placeholder.py", line 122, in _build 
    'Placeholder "{}" requires a shape.'.format(self.name)) 
kur.containers.parsing_error.ParsingError: Placeholder "..input.0" requires a shape.

使用- input: SepalWidthCm按預期工作。

來源

2017-02-11 Hugo Sereno Ferreira

您的方法存在的問題是庫爾不知道如何連接輸入。如果您的輸入變成二維張量的維（2，N）（其中N是CSV文件中的數據點數），像這樣？

[ 
    [SepalWidthCm_0, SepalWidthCm_1, ...], 
    [SepalLengthCm_0, SepalLengthCm_1, ...] 
]

（N.B.，即例如不是非常深的學習友好結構。）或者它應該被合併成尺寸（Ñ，2）的張量，這樣？

[ 
    [SepalWidthCm_0, SepalLengthCm_0], 
    [SepalWidthCm_1, SepalLengthCm_1], 
    ... 
]

或者您可能想要將相同的操作並行應用於每一列？無論如何，當你的輸入數據是多維的時候（例如，代替像長度或寬度的標量，你有矢量甚至矩陣），這個問題會變得更難/更模糊。 Kur預計每個輸入都是一個單一的數據源，然而，如果你認爲合適的話，你可以將它們組合起來。

以下是您可能希望將數據合併到一起的幾種方法，以及如何在Kur中執行此操作。

逐行組合。這是上面的第二個示例，我們希望將CSV數據的「行」組合爲元組，以便輸入張量具有維度（批量大小,2）。那麼你的庫爾模型將如下所示：

model: 

    # Define the model inputs. 
    - input: SepalWidthCm 
    - input: SepalLengthCm 

    # Concatenate the inputs. 
    - merge: concat 
    inputs: [SepalWidthCm, SepalLengthCm] 

    # Do processing on these "vectorized" inputs. 
    - dense: 10 
    - activation: tanh 
    - dense: 1 
    - activation: tanh 

    # Output 
    - output: Species

獨立處理，然後結合。這是您對每個輸入列獨立進行一些操作的設置，然後將它們合併在一起（可能還會有一些更多的操作）。在ASCII藝術，這可能是這樣的：

INPUT_1 --> dense, activation --\ 
           +---> dense, activation --> OUTPUT 
INPUT_2 --> dense, activation --/

在這種情況下，你將有一個庫爾模型，看起來像這樣：

model: 

    # First "branch" of processing. 
    - input: SepalWidthCm 
    - dense: 10 
    - activation: tanh 
    name: WidthBranch 

    # Second "branch" of processing. 
    - input: SepalLengthCm 
    - dense: 10 
    - activation: tanh 
    name: LengthBranch 

    # Fuse things together. 
    - merge: 
    inputs: [WidthBranch, LengthBranch] 

    # Continue some processing 
    - dense: 1 
    - activation: tanh 

    # Output 
    - output: Species

請記住，merge層自Kur 0.3開始，所以確保你使用最新版本。

（免責聲明：我的庫爾核心的維護者。）

來源

2017-02-16 20:28:39

謝謝您的回答，亞當。我可以建議你有這些文件的例子嗎？這將是非常有用:) –

這是一個偉大的觀點！我很快會添加到文檔中。 –

順便說一句，@adam，我試圖通過Deepgram Web Chat與您私下聯繫。一旦你找到時間，你會回到我身邊嗎？謝謝。 –

如何指定輸入時使用CSV與庫爾

回答

相關問題