你可能要考慮重建索引數據根據你想如何使用它。
可以基於列「反式」和「民」像這樣的索引數據:
#Change how we index the frame
df.set_index(["Trans", "Num"], inplace=True)
接下來,我們會抓住每一個都是獨一無二的指數,所以我們可以全部更換(I」敢肯定這部分和下面的迭代可以批量完成,但我只是做這個快。如果您有效率問題,研究如何不是沒有結束的所有索引迴路可能。)
#Get only unique indexes
unique_trans = list(set(df.index.get_level_values('Trans')))
然後我們可以遍歷並應用你想要的。
# Access each index
for trans in unique_trans:
# Get the higher number in "Num" for each so we know which to set to NaN
max_num = max(df.ix[trans].index.values)
# Copy your start column as a temp variable
start = df.ix[trans]["Start"].copy()
# Apply the transform to the start column (Equal to end + 10)
df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10
# Apply the transform to the end column
df.loc[trans, "End"] = np.array(start.shift(-1) - 10)
# By passing a tuple as a row index, we get the element that is both in trans and the max number,
#which is the one you want to set to NaN
df.loc[(trans, max_num), "End"] = np.nan
print(df)
運行數據時,我從這次得到的結果是:
Head Chr Start End
Trans Num
ENST473358 1 A 1 30049.0 30554.0
2 A 1 30677.0 30966.0
3 A 1 31107.0 NaN
ENST417324 1 B 1 35277.0 35481.0
2 B 1 34554.0 35174.0
3 B 1 35721.0 NaN
ENST461467 1 B 1 35245.0 35481.0
2 B 1 120775.0 NaN
我用來生成測試用例完整的代碼是這樣的:
import pandas as pd
import numpy as np
# Setup your dataframe
df = pd.DataFrame(columns=["Head", "Chr", "Start", "End", "Trans", "Num"])
df["Head"] = ["A", "A", "A", "B", "B", "B", "B", "B"]
df["Chr"] = [1]*8
df["Start"] = [29554, 30564, 30976, 36091, 35491, 35184, 36083, 35491]
df["End"] = [30039, 30667, 31097, 35267, 34544, 35711, 35235, 120765]
df["Trans"] = ["ENST473358", "ENST473358", "ENST473358",
"ENST417324", "ENST417324", "ENST417324",
"ENST461467","ENST461467"]
df["Num"] = [1, 2, 3, 1, 2, 3, 1, 2]
# Change how we index the frame
df.set_index(["Trans", "Num"], inplace=True)
# Get only unique indexes
unique_trans = list(set(df.index.get_level_values('Trans')))
# Access each index
for trans in unique_trans:
max_num = max(df.ix[trans].index.values)
start = df.ix[trans]["Start"].copy()
df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10
df.loc[trans, "End"] = np.array(start.shift(-1) - 10)
df.loc[(trans, max_num), "End"] = np.nan
print(df)
對不起你只是問如果將'End'設置爲'NaN',如果它是'Trans'的最後一行? – EdChum
不,我可以看到我想要的輸出值列開始和結束應該改變,因爲我問 – user1017373
我真的沒有看到任何清晰的模式之間的第一個和第二個DF ... – Jacquot