2016-08-30 74 views
5

我是python和TensorFlow的新手。最近,我開始理解和執行TensorFlow例子,碰上了這一個:https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.htmlTypeError:'float'類型的參數是不可迭代的

我得到了錯誤,類型錯誤:類型「浮動」的說法是沒有迭代,我相信這個問題是下面的行的代碼:

df_train [LABEL_COLUMN] =(df_train [ 'income_bracket']應用(拉姆達×:在X '> 50K'))。astype(INT)

(income_bracket是的標籤列普查數據集,其中'> 50K'是可能的標籤值之一,另一個標籤是'= < 50K'。數據集被讀入df_train。在文檔中提供的理由是,「由於任務是一個二元分類問題,我們將構建一個名爲」label「的標籤列,如果收入超過50K,則其值爲1,否則爲0。 「)

如果有人能解釋我究竟發生了什麼,應該如何解決這個問題,那將會很棒。我嘗試使用Python2.7和Python3.4,我不認爲問題出現在該語言的版本中。此外,如果有人知道TensorFlow和熊貓新手的優秀教程,請分享鏈接。

完整的程序:

import pandas as pd 
import urllib 
import tempfile 
import tensorflow as tf 

gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"]) 
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"]) 
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000) 
marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100) 
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100) 
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100) 
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000) 
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000) 


age = tf.contrib.layers.real_valued_column("age") 
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) 
education_num = tf.contrib.layers.real_valued_column("education_num") 
capital_gain = tf.contrib.layers.real_valued_column("capital_gain") 
capital_loss = tf.contrib.layers.real_valued_column("capital_loss") 
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week") 

wide_columns = [gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets, tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))] 

deep_columns = [ 
    tf.contrib.layers.embedding_column(workclass, dimension=8), 
    tf.contrib.layers.embedding_column(education, dimension=8), 
    tf.contrib.layers.embedding_column(marital_status, dimension=8), 
    tf.contrib.layers.embedding_column(gender, dimension=8), 
    tf.contrib.layers.embedding_column(relationship, dimension=8), 
    tf.contrib.layers.embedding_column(race, dimension=8), 
    tf.contrib.layers.embedding_column(native_country, dimension=8), 
    tf.contrib.layers.embedding_column(occupation, dimension=8), 
    age, education_num, capital_gain, capital_loss, hours_per_week] 

model_dir = tempfile.mkdtemp() 
m = tf.contrib.learn.DNNLinearCombinedClassifier(
    model_dir=model_dir, 
    linear_feature_columns=wide_columns, 
    dnn_feature_columns=deep_columns, 
    dnn_hidden_units=[100, 50]) 


COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num", 
    "marital_status", "occupation", "relationship", "race", "gender", 
    "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"] 
LABEL_COLUMN = 'label' 
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "gender", "native_country"] 
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"] 


train_file = tempfile.NamedTemporaryFile() 
test_file = tempfile.NamedTemporaryFile() 
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name) 
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name) 


df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True) 
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1) 
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) 
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) 


def input_fn(df): 

    continuous_cols = {k: tf.constant(df[k].values) 
        for k in CONTINUOUS_COLUMNS} 

    categorical_cols = {k: tf.SparseTensor(
     indices=[[i, 0] for i in range(df[k].size)], 
     values=df[k].values, 
     shape=[df[k].size, 1]) 
         for k in CATEGORICAL_COLUMNS} 

    feature_cols = dict(continuous_cols.items() + categorical_cols.items()) 
    label = tf.constant(df[LABEL_COLUMN].values) 
    return feature_cols, label 


def train_input_fn(): 
    return input_fn(df_train) 


def eval_input_fn(): 
    return input_fn(df_test) 

m.fit(input_fn=train_input_fn, steps=200) 
results = m.evaluate(input_fn=eval_input_fn, steps=1) 
for key in sorted(results): 
    print("%s: %s" % (key, results[key])) 

謝謝

PS:全堆棧跟蹤誤差

Traceback (most recent call last): 

File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <module> 
    df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) 

File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2023, in apply 
    mapped = lib.map_infer(values, f, convert=convert_dtype) 

File "inference.pyx", line 920, in pandas.lib.map_infer (pandas/lib.c:44780) 

File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <lambda> 
    df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) 

TypeError: argument of type 'float' is not iterable 
+0

ib = df_test ["income_bracket"] t = type('12') for idx,i in enumerate(ib): if(type(i) != t): print idx,type(i) 

RESULT: 0 <type 'float'>

所以,你可能只是跳過這一行你能發佈錯誤的完整堆棧跟蹤嗎? – mrry

+0

我剛更新了整個堆棧跟蹤的問題@mrry –

+0

看起來'pandas'已經將您的「income_bracket」字段解析爲「float」,而不是字符串。您可以嘗試在您的代碼中添加print df_train ['income_bracket']。dtype'並讓我們知道結果嗎? – mrry

回答

1

程序與大熊貓的最新版本可以逐字,即0.18。 1

1

正如你所看到的,當你檢查test.data,你會明顯看到第一行數據在income_bracket字段中有「NAN」。

我進一步檢查,這是唯一的行包含 「NAN」 這樣做:

df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)

相關問題