在Python中對float64熊貓數據框進行二進制化處理

我有一個帶有各種列的Panda DF（每個表示一個語料庫中單詞的頻率）。每一行對應一個文檔，每一行都是float64類型。在Python中對float64熊貓數據框進行二進制化處理

例如：

word1 word2 word3 
0.0 0.3 1.0 
0.1 0.0 0.5 
etc

欲二進制化這一點，代替頻率結束了一個布爾型（0和1，DF），指示文字

所以上面的示例中的存在將轉換爲：

word1 word2 word3 
0  1  1 
1  0  1 
etc

我看着get_dummies（），但輸出不是預期的。

來源

2016-09-27 Snake_A

代碼：

import numpy as np 
import pandas as pd 

""" create some test-data """ 
random_data = np.random.random([3, 3]) 
random_data[0,0] = 0.0 
random_data[1,2] = 0.0 

df = pd.DataFrame(random_data, 
    columns=['A', 'B', 'C'], index=['first', 'second', 'third']) 

print(df) 

""" binarize """ 
threshold = lambda x: x > 0 
df_ = df.apply(threshold).astype(int) 

print(df_)

輸出：

A   B   C 
first 0.000000 0.610263 0.301024 
second 0.728070 0.229802 0.000000 
third 0.243811 0.335131 0.863908 
A B C 
first 0 1 1 
second 1 1 0 
third 1 1 1

備註：

get_dummies（）分析每列中的每個唯一值，並且引入了新的列（用於每個唯一值），以標記此值是否有效
=如果列A有20個唯一的值統一電力公司，20個新列的添加，其中一列是真的，其他都是假的

來源

2016-09-27 23:19:33 sascha

轉換爲Boolean將導致True任何東西，這並不是任何零進零—和False。如果你轉換爲整數，你會得到一個和零。

import io 
import pandas as pd 

data = io.StringIO('''\ 
word1 word2 word3 
0.0 0.3 1.0 
0.1 0.0 0.5 
''') 
df = pd.read_csv(data, delim_whitespace=True) 

res = df.astype(bool).astype(int) 
print(res)

輸出：

word1 word2 word3 
0  0  1  1 
1  1  0  1

來源

2016-09-27 23:36:02

我會回答是@Alberto加西亞拉沃索回答，但這裏是非常快，利用同樣的想法的替代品。

使用np.where

pd.DataFrame(np.where(df, 1, 0), df.index, df.columns)

時序

來源

2016-09-28 00:09:23 piRSquared

實測值使用熊貓索引的替代方式。

這可以通過

df[df>0] = 1

這麼簡單簡單地做！

來源

2016-10-04 19:55:36

在Python中對float64熊貓數據框進行二進制化處理

回答

相關問題