2
我想展開這個數據幀的「特徵」列,以便創建一個新的數據幀,這些特徵成爲列名。在熊貓中形成一個稀疏特徵矩陣數據幀
例如。由此看來,
對此,
我的解決方案作品,但我不認爲這是非常好的,因爲有很多的for循環。也許有更好的方法可以利用Pandas.DataFrame類的特性?
的代碼生成功能矩陣如下,
def feature_data_frame_by_exploding_column(input_df, col_name):
# Create data frame with same columns minus the column you want to explode
df = input_df.copy()
del df[col_name]
# The items that you want to become new features
all_new_features = []
new_feature_list = input_df[col_name].values
for ingred_list in new_feature_list:
all_new_features.extend(ingred_list) # Extend vs append!
# Add new features as columns of zeros
for feature in all_new_features:
df[feature] = 0
# For each row in data frame set values that need to be 1
for index in df.index:
ingreds_arr = new_feature_list[index]
df.loc[index, ingreds_arr] = 1
return df
df = pd.DataFrame(columns = ["id", "features"])
df['id'] = [0,1]
df['features'] = [["A", "B"], ["C", "D"]]
df
feature_data_frame_by_exploding_column(df,"features")