用一個當前是索引的值替換pandas DataFrame中的字符串。

我有一些分析輸出（解析成熊貓DataFrame），需要一些後處理。下面是數據框的樣子：用一個當前是索引的值替換pandas DataFrame中的字符串。

        1   2    3   4  
index   GeneSymbol              
11746909_a_at A1CF  11736238_a_at 0.038230 11724734_at 0.024966 
11736238_a_at ABCA5  11746909_a_at 0.038230 11724734_at 0.024771 
11724734_at ABCB8  11746909_a_at 0.024966 11736238_a_at 0.024771 
11723976_at ABCC8  11746909_a_at 0.017006 11736238_a_at 0.046125 
11718612_a_at ABCD4  11746909_a_at 0.014982 11736238_a_at 0.050172

這裏，我們有一個雙向的多指標，外指數是唯一的ID和內部索引與ID相關聯的符號。然後列$ 1，...，n $在ID和數值之間交替（給出相關性的強度）。這些列中的每個ID都位於索引中。我的問題是：用適當的符號替換無用信息ID的最佳策略是什麼？

例如，在輸出表中的第一行是這樣的：提前

        1   2    3   4  
index   GeneSymbol              
11746909_a_at A1CF  ABCA5   0.038230 ABCB8  0.024966 
11736238_a_at ABCA5  11746909_a_at 0.038230 11724734_at 0.024771 
11724734_at ABCB8  11746909_a_at 0.024966 11736238_a_at 0.024771 
11723976_at ABCC8  11746909_a_at 0.017006 11736238_a_at 0.046125 
11718612_a_at ABCD4  11746909_a_at 0.014982 11736238_a_at 0.050172

感謝

來源

2017-07-15 CiaranWelsh

可以使用replace通過Series創建由reset_index：

df = df.replace(df.reset_index(level=1)['GeneSymbol']) 
print (df) 
           1   2  3   4 
index   GeneSymbol         
11746909_a_at A1CF  ABCA5 0.038230 ABCB8 0.024966 
11736238_a_at ABCA5  A1CF 0.038230 ABCB8 0.024771 
11724734_at ABCB8  A1CF 0.024966 ABCA5 0.024771 
11723976_at ABCC8  A1CF 0.017006 ABCA5 0.046125 
11718612_a_at ABCD4  A1CF 0.014982 ABCA5 0.050172

另一種解決方案，由list of tuples創建，由Index.values創建：

df = df = df.replace(dict(df.index.values)) 
print (df) 
           1   2  3   4 
index   GeneSymbol         
11746909_a_at A1CF  ABCA5 0.038230 ABCB8 0.024966 
11736238_a_at ABCA5  A1CF 0.038230 ABCB8 0.024771 
11724734_at ABCB8  A1CF 0.024966 ABCA5 0.024771 
11723976_at ABCC8  A1CF 0.017006 ABCA5 0.046125 
11718612_a_at ABCD4  A1CF 0.014982 ABCA5 0.050172

來源

2017-07-15 20:20:57 jezrael

非常優雅，謝謝。 – CiaranWelsh

用一個當前是索引的值替換pandas DataFrame中的字符串。

回答

相關問題