Pandas Dataframe：如何將整數解析爲0和1的字符串？

我有以下的熊貓數據框。Pandas Dataframe：如何將整數解析爲0和1的字符串？

import pandas as pd 
df = pd.read_csv('filename.csv') 

print(df) 

     sample  column_A   
0  sample1  6/6  
1  sample2  0/4 
2  sample3  2/6  
3  sample4  12/14 
4  sample5  15/21 
5  sample6  12/12 
.. ....

在column_A的值不是分數，這些數據必須被操縱，使得我可以每個值轉換成0s和1s（不是整數轉換成它們的二元對應物）。

上面的「分子」給出總數爲1s，而「分母」給出總數爲0s和1s。

因此，該表實際上應該採用以下格式：

 sample  column_A   
0  sample1  111111  
1  sample2  0000 
2  sample3  110000  
3  sample4  11111111111100  
4  sample5  111111111111111000000 
5  sample6  111111111111 
.. ....

我從來沒有解析，以0和1這樣的輸出字符串的整數。如何做到這一點？是否有與lambda表達式一起使用的「熊貓方法」？ Pythonic字符串解析或正則表達式？

來源

2016-07-25 ShanZhengYang

我想說的字符串解析，喜歡的東西' a，b = map（int，field.split（'/'））;結果='1'* a +'0'*（b-a）'。 – TigerhawkT3

首先，假設你寫一個函數：

def to_binary(s): 
    n_d = s.split('/') 
    n, d = int(n_d[0]), int(n_d[1]) 
    return '1' * n + '0' * (d - n)

這樣，

>>> to_binary('4/5') 
'11110'

現在你只需要使用pandas.Series.apply：

df.column_A.apply(to_binary)

來源

2016-07-25 15:16:12

一種替代方案：

df2 = df['column_A'].str.split('/', expand=True).astype(int)\ 
        .assign(ones='1').assign(zeros='0') 

df2 
Out: 
    0 1 ones zeros 
0 6 6 1  0 
1 0 4 1  0 
2 2 6 1  0 
3 12 14 1  0 
4 15 21 1  0 
5 12 12 1  0 

(df2[0] * df2['ones']).str.cat((df2[1]-df2[0])*df2['zeros']) 
Out: 
0     111111 
1      0000 
2     110000 
3   11111111111100 
4 111111111111111000000 
5    111111111111 
dtype: object

注：我實際上試圖找到一個更快的替代思想，應用會很慢，但這一個變慢。

來源

2016-07-25 15:35:17 ayhan

我喜歡這個解決方案，但@AmiTavory在此之前有一個體面的答案。我認爲這可能會更快，但我沒有檢查。我希望我能接受這兩個問題！ – ShanZhengYang

@尚正陽謝謝您，但您標記爲正確。我想你打算標記阿米塔維裏的答案（這也是我的選擇）。 – ayhan

這是一個非常有趣的問題，我喜歡這兩個答案。這裏是我嘗試將它作爲一行：'df.column_A.str.extract（r'（？P \ d +）/（？P \ d +）'，expand = True）.astype（int） .apply（lambda x：['1'] * x.one + ['0'] *（x.len-x.one），axis = 1）.apply（''。join）' - 它會變慢，只是想有一個單線......;） – MaxU

下面是使用extract()一些可供選擇的解決方案和.str.repeat()方法：

In [187]: x = df.column_A.str.extract(r'(?P<ones>\d+)/(?P<len>\d+)', expand=True).astype(int).assign(o='1', z='0') 

In [188]: x 
Out[188]: 
    ones len o z 
0  6 6 1 0 
1  0 4 1 0 
2  2 6 1 0 
3 12 14 1 0 
4 15 21 1 0 
5 12 12 1 0 

In [189]: x.o.str.repeat(x.ones) + x.z.str.repeat(x.len-x.ones) 
Out[189]: 
0     111111 
1      0000 
2     110000 
3   11111111111100 
4 111111111111111000000 
5    111111111111 
dtype: object

或慢速（二apply()）的一行：

In [190]: %paste 
(df.column_A.str.extract(r'(?P<one>\d+)/(?P<len>\d+)', expand=True) 
    .astype(int) 
    .apply(lambda x: ['1'] * x.one + ['0'] * (x.len-x.one), axis=1) 
    .apply(''.join) 
) 
## -- End pasted text -- 
Out[190]: 
0     111111 
1      0000 
2     110000 
3   11111111111100 
4 111111111111111000000 
5    111111111111 
dtype: object

來源

2016-07-25 18:20:21 MaxU

Pandas Dataframe：如何將整數解析爲0和1的字符串？

回答

相關問題