強制列相同的數據類型

我有兩個dataframes被構造爲這樣：強制列相同的數據類型

print(product_combos1.head(n=5)) 
      product_id count Length 
0   (P06, P09) 36340  2 
1 (P01, P05, P06, P09) 10085  4 
2   (P01, P06) 36337  2 
3   (P01, P09) 49897  2 
4   (P02, P09) 11573  2 

print(testing_df.head(n=5)) 
        product_id Length 
transaction_id       
001      [P01]  1 
002     [P01, P02]  2 
003    [P01, P02, P09]  3 
004     [P01, P03]  2 
005    [P01, P03, P05]  3

我怎麼能強迫的testing_df，使其在同一的「PRODUCT_ID」列格式爲product_combos1 df中的列？（即 - 括號而不是括號）

來源

2017-08-04 zsad512

python元組顯示在圓括號中。列表顯示在括號中。

更改數據框

testing_df['product_id'] = testing_df['product_id'].apply(tuple) 
testing_df 

        product_id Length 
transaction_id       
1      (P01,)  1 
2     (P01, P02)  2 
3    (P01, P02, P09)  3 
4     (P01, P03)  2 
5    (P01, P03, P05)  3

製作副本

testing_df.assign(product_id=testing_df.product_id.apply(tuple)) 

        product_id Length 
transaction_id       
1      (P01,)  1 
2     (P01, P02)  2 
3    (P01, P02, P09)  3 
4     (P01, P03)  2 
5    (P01, P03, P05)  3

當然，除非那些實際上是字符串。然後用括號替換括號。

testing_df.assign(product_id=testing_df.product_id.str.replace('\[(.*)\]', r'(\1)')) 

        product_id Length 
transaction_id       
1       (P01)  1 
2     (P01, P02)  2 
3    (P01, P02, P09)  3 
4     (P01, P03)  2 
5    (P01, P03, P05)  3

來源

2017-08-04 23:44:26 piRSquared

唯一的問題是，我的DF的第一線已經從去'[「P01」]'來'（「P01」，）'我不知道爲什麼「」已添加到第一行 – zsad512

啊，所以列元素是列表，你應用'元組'。是的，另一個數據框沒有長度一個元組。這個長度有一個列表。 Python用'（x，）'顯示長度的一個元組，用逗號區分表達式'（x）'。這隻會評估爲'x' – piRSquared

當我試圖比較兩個數據幀時，這會導致任何複雜嗎？如果你能提供幫助，請參閱[link]（https://stackoverflow.com/questions/45515412/pandas-return-partial-matches-between-rows-of-two-dataframes）。 – zsad512

強制列相同的數據類型

回答

相關問題