代碼:
import numpy as np
import pandas as pd
""" create some test-data """
random_data = np.random.random([3, 3])
random_data[0,0] = 0.0
random_data[1,2] = 0.0
df = pd.DataFrame(random_data,
columns=['A', 'B', 'C'], index=['first', 'second', 'third'])
print(df)
""" binarize """
threshold = lambda x: x > 0
df_ = df.apply(threshold).astype(int)
print(df_)
輸出:
A B C
first 0.000000 0.610263 0.301024
second 0.728070 0.229802 0.000000
third 0.243811 0.335131 0.863908
A B C
first 0 1 1
second 1 1 0
third 1 1 1
備註:
- get_dummies()分析每列中的每個唯一值,並且引入了新的列(用於每個唯一值),以標記此值是否有效
- =如果列A有20個唯一的值統一電力公司,20個新列的添加,其中一列是真的,其他都是假的