在Tensorflow中創建許多特徵列

我開始使用Tensorflow項目，並且正在定義和創建我的特徵列。但是，我擁有數百和數百個功能 - 這是一個相當廣泛的數據集。即使在預處理和擦洗之後，我也有很多列。在Tensorflow中創建許多特徵列

創建feature_column的傳統方式在Tensorflow tutorial甚至是StackOverflow post中定義。你基本上是聲明並初始化每個功能列Tensorflow對象：

gender = tf.feature_column.categorical_column_with_vocabulary_list(
    "gender", ["Female", "Male"])

這工作都很好，如果你的數據集只有幾列，但對我來說，我當然不希望有數以百計的代碼行初始化不同的feature_column對象。

解決此問題的最佳方法是什麼？我注意到，在本教程中，所有列被採集作爲一個列表：

base_columns = [ 
    gender, native_country, education, occupation, workclass, relationship, 
    age_buckets, 
]

這是最終傳遞到您的估計：

m = tf.estimator.LinearClassifier(
    model_dir=model_dir, feature_columns=base_columns)

所以纔會處理feature_column創造數百的理想方式列是將它們直接附加到列表中？像這樣？

my_columns = [] 

for col in df.columns: 
    if is_string_dtype(df[col]): #is_string_dtype is pandas function 
     my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
      hash_bucket_size= len(df[col].unique()))) 

    elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function 
     my_column.append(tf.feature_column.numeric_column(col))

這是創建這些特徵列的最佳方式嗎？或者我錯過了Tensorflow的一些功能，可以讓我解決這一步驟？

來源

2017-10-19 Yu Chen

你對我有什麼意義。 :) – greeness

你可以提出這個答案，@ greeness？謝謝！ :) – dga

好吧，它不會添加任何東西，但操作的問題。 – greeness

你對我有意義。 :)從你自己的代碼複製：

my_columns = [] 

for col in df.columns: 
    if is_string_dtype(df[col]): #is_string_dtype is pandas function 
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
     hash_bucket_size= len(df[col].unique()))) 

    elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function 
    my_columns.append(tf.feature_column.numeric_column(col))

來源

2017-11-20 04:12:45 greeness

我用你自己的答案。剛剛編輯了一下（應該有my_columns而不是for循環中的my_column），並按照它爲我工作的方式發佈。

import pandas.api.types as ptypes 

my_columns = [] 

for col in df.columns: 
    if ptypes.is_string_dtype(df[col]): #is_string_dtype is pandas function 
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
     hash_bucket_size= len(df[col].unique()))) 

    elif ptypes.is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function 
    my_columns.append(tf.feature_column.numeric_column(col))

來源

2018-02-18 03:04:05

在Tensorflow中創建許多特徵列

回答

相關問題