2016-05-24 105 views
0

我有一個數據幀,看起來像這樣,在數據幀相對於更改列值的其他列

Head CHR Start End Trans Num 
A 1 29554 30039 ENST473358 1 
A 1 30564 30667 ENST473358 2 
A 1 30976 31097 ENST473358 3 
B 1 36091 35267 ENST417324 1 
B 1 35491 34544 ENST417324 2 
B 1 35184 35711 ENST417324 3 
B 1 36083 35235 ENST461467 1 
B 1 35491 120765 ENST461467 2 

,我需要改變列開始,並且相對於柱跨和Num結束。意思是,欄Trans具有在列Num中提到的重複的值。等等。意思是我想改變開始as -End + 10和結束as-從下一行開始(它有相同的Trans)-10等等所有行。所以我的目標是得到一個如下所示的輸出,

Head CHR Start End  Trans Num 
    A 1 30564 30667 ENST473358 1 
    A 1 30976 31097 ENST473358 2 
    A 1 30267 NA  ENST473358 3 
    B 1 35277 35481 ENST417324 1 
    B 1 34554 35174 ENST417324 2 
    B 1 35721 NA  ENST417324 3 
    B 1 35245 35481 ENST461467 1 
    B 1 120775 NA  ENST461467 2 

任何幫助非常感謝我可以做到這一點,而不考慮與以下腳本Trans,但我不會得到我想要的輸出。

start = df['Start'].copy() 
df['Start'] = df.End + 10 
df['End'] = ((start.shift(-1) - 10)) 
df.iloc[-1, df.columns.get_loc('Start')] = '' 
df.iloc[-1, df.columns.get_loc('End')] = '' 
print (df) 
+1

對不起你只是問如果將'End'設置爲'NaN',如果它是'Trans'的最後一行? – EdChum

+0

不,我可以看到我想要的輸出值列開始和結束應該改變,因爲我問 – user1017373

+0

我真的沒有看到任何清晰的模式之間的第一個和第二個DF ... – Jacquot

回答

2

你可能要考慮重建索引數據根據你想如何使用它。

可以基於列「反式」和「民」像這樣的索引數據:

#Change how we index the frame 
df.set_index(["Trans", "Num"], inplace=True) 

接下來,我們會抓住每一個都是獨一無二的指數,所以我們可以全部更換(I」敢肯定這部分和下面的迭代可以批量完成,但我只是做這個快。如果您有效率問題,研究如何不是沒有結束的所有索引迴路可能。)

#Get only unique indexes 
unique_trans = list(set(df.index.get_level_values('Trans'))) 

然後我們可以遍歷並應用你想要的。

# Access each index 
for trans in unique_trans: 

    # Get the higher number in "Num" for each so we know which to set to NaN 
    max_num = max(df.ix[trans].index.values) 

    # Copy your start column as a temp variable 
    start = df.ix[trans]["Start"].copy() 

    # Apply the transform to the start column (Equal to end + 10)   
    df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10 

    # Apply the transform to the end column 
    df.loc[trans, "End"] = np.array(start.shift(-1) - 10) 

    # By passing a tuple as a row index, we get the element that is both in trans and the max number, 
    #which is the one you want to set to NaN 
    df.loc[(trans, max_num), "End"] = np.nan 

print(df) 

運行數據時,我從這次得到的結果是:

   Head Chr  Start  End 
Trans  Num        
ENST473358 1  A 1 30049.0 30554.0 
      2  A 1 30677.0 30966.0 
      3  A 1 31107.0  NaN 
ENST417324 1  B 1 35277.0 35481.0 
      2  B 1 34554.0 35174.0 
      3  B 1 35721.0  NaN 
ENST461467 1  B 1 35245.0 35481.0 
      2  B 1 120775.0  NaN 

我用來生成測試用例完整的代碼是這樣的:

import pandas as pd 
import numpy as np 
# Setup your dataframe 
df = pd.DataFrame(columns=["Head", "Chr", "Start", "End", "Trans", "Num"]) 
df["Head"] = ["A", "A", "A", "B", "B", "B", "B", "B"] 
df["Chr"] = [1]*8 
df["Start"] = [29554, 30564, 30976, 36091, 35491, 35184, 36083, 35491] 
df["End"] = [30039, 30667, 31097, 35267, 34544, 35711, 35235, 120765] 
df["Trans"] = ["ENST473358", "ENST473358", "ENST473358", 
       "ENST417324", "ENST417324", "ENST417324", 
       "ENST461467","ENST461467"] 
df["Num"] = [1, 2, 3, 1, 2, 3, 1, 2] 

# Change how we index the frame 
df.set_index(["Trans", "Num"], inplace=True) 

# Get only unique indexes 
unique_trans = list(set(df.index.get_level_values('Trans'))) 

# Access each index 
for trans in unique_trans: 
    max_num = max(df.ix[trans].index.values) 

    start = df.ix[trans]["Start"].copy() 
    df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10 
    df.loc[trans, "End"] = np.array(start.shift(-1) - 10) 
    df.loc[(trans, max_num), "End"] = np.nan 

print(df) 
+0

感謝您的解決方案 – user1017373

1

您可以通過Trans把你的現有代碼的功能,然後組和應用功能:

def func(df): 
    start = df['Start'].copy() 
    df['Start'] = df.End + 10 
    df['End'] = ((start.shift(-1) - 10)) 
    df.iloc[-1, df.columns.get_loc('Start')] = '' 
    df.iloc[-1, df.columns.get_loc('End')] = '' 
    return df 

df.groupby('Trans').apply(func) 

結果:

Head CHR Start  End  Trans Num 
0 A 1 30677 30966 ENST473358 1 
1 A 1 31107 30257 ENST473358 2 
2 A 1     ENST473358 3 
3 B 1 35491 34544 ENST417324 1 
4 B 1 35184 35711 ENST417324 2 
5 B 1     ENST417324 3 
6 B 1 35491 120765 ENST461467 1 
7 B 1     ENST461467 2 
相關問題