2016-11-05 77 views
2

我目前做的Coursera(機Leraning)課程由華盛頓大學提供的,我面臨什麼問題與numpygraphlabGraphlab和numpy的問題

課程要求使用一個版本的graphlab高於1.7 煤礦是更高,因爲你可以看到下面,但是,當我運行下面的腳本,我如下得到了一個錯誤:

[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. 
    def get_numpy_data(data_sframe, features, output): 
     data_sframe['constant'] = 1 
     features = ['constant'] + features # this is how you combine two lists 
     # the following line will convert the features_SFrame into a numpy matrix: 
     feature_matrix = features_sframe.to_numpy() 
     # assign the column of data_sframe associated with the output to the SArray output_sarray 

     # the following will convert the SArray into a numpy array by first converting it to a list 
     output_array = output_sarray.to_numpy() 
     return(feature_matrix, output_array) 

    (example_features, example_output) = get_numpy_data(sales,['sqft_living'], 'price') # the [] around 'sqft_living' makes it a list 
    print example_features[0,:] # this accesses the first row of the data the ':' indicates 'all columns' 
    print example_output[0] # and the corresponding output 

    ----> 8  feature_matrix = features_sframe.to_numpy() 
    NameError: global name 'features_sframe' is not defined 

上面的腳本是由教材作者寫的,所以我相信有是我做錯了

任何幫助將不勝感激。

+0

你的參數被稱爲'data_sframe',但你試圖強制'features_sframe'到'numpy'矩陣。 'feature_sframe'從哪裏來?這可能是問題嗎? – Abdou

+0

感謝您的回覆@Abdou,但是,我不認爲這是一個問題,因爲我看到其他人在網上發佈工作,這個腳本已經完美運作。我相信應該是與我的'graphlab'或我的'numpy'版本有關的東西。特徵('sqft_living')是我的數據框「sales」中的一列 –

回答

2

您應該在運行之前完成功能get_numpy_data,這就是爲什麼您會收到錯誤。按照原有功能的說明,這實際上是:

def get_numpy_data(data_sframe, features, output): 
    data_sframe['constant'] = 1 # this is how you add a constant column to an SFrame 
    # add the column 'constant' to the front of the features list so that we can extract it along with the others: 
    features = ['constant'] + features # this is how you combine two lists 
    # select the columns of data_SFrame given by the features list into the SFrame features_sframe (now including constant): 

    # the following line will convert the features_SFrame into a numpy matrix: 
    feature_matrix = features_sframe.to_numpy() 
    # assign the column of data_sframe associated with the output to the SArray output_sarray 

    # the following will convert the SArray into a numpy array by first converting it to a list 
    output_array = output_sarray.to_numpy() 
    return(feature_matrix, output_array) 
0

graphlab分配指令有你轉換從graphlabpandas再到numpy。您可以直接跳過graphlab部件並使用pandas。 (這在作業描述中明確允許。)

首先讀入數據文件。然後

import pandas as pd 

dtype_dict = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float, 'grade':int, 'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str, 'long':float, 'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int, 'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int, 'view':int} 
sales = pd.read_csv('data//kc_house_data.csv', dtype=dtype_dict) 
train_data = pd.read_csv('data//kc_house_train_data.csv', dtype=dtype_dict) 
test_data = pd.read_csv('data//kc_house_test_data.csv', dtype=dtype_dict) 

轉換到numpy功能變得

def get_numpy_data(df, features, output): 
    df['constant'] = 1 

    # add the column 'constant' to the front of the features list so that we can extract it along with the others 
    features = ['constant'] + features 

    # select the columns of data_SFrame given by the features list into the SFrame features_sframe 
    features_df = pd.DataFrame(**FILL IN THE BLANK HERE WITH YOUR CODE**) 

    # cast the features_df into a numpy matrix 
    feature_matrix = features_df.as_matrix() 

    etc. 

其餘代碼應該是相同的(因爲你只用numpy版本分配其餘工作)。