如何鏈接未知大小的兩個依賴輸入與變量

這是我的第一個python腳本。我的數據如下所示：如何鏈接未知大小的兩個依賴輸入與變量

Position ind1 ind2 ind3 ind4 ind5 ind5 ind7 ind8 
0  C A  C A A A A  A 
1  C A  C C C A A  A

但它可能在多個列中有所不同，並且有數千行。

我的腳本根據需要逐行讀取此文件，並計算每個位置（POS）中個體（以下稱人羣）組合的A和C的頻率。例如羣體1的位置0處的A的頻率（ind1，ind2，ind3，ind4）;和羣體2（ind5，ind6，ind7，ind8）的位置0處的A的頻率，然後針對POS1,2,3 ...的相同...

爲此，我定義了我的腳本這段代碼：

alleles1 = alleles[1:5] 
alleles2 = alleles[5:]

但如果我有900個多列和列的不同組合，我需要後來修改等位基因*和腳本的其餘部分。

我想讓我的程序更具交互性，用戶定義了總體數量並指定哪個列對應於哪個總體。

代碼我迄今爲止：

#ask for the number of populations 
try: 
    num_pop = int(raw_input("How many populations do you have? > ")) 
except ValueError: 
    print "In is not an integer! \nThe program exits...\n " 
#ask for individuals in population 
ind_pop = {} 
for i in range(num_pop): 
    i += 1 
    ind_input = str(raw_input("Type column numbers of population %i > " % i)) 
    ind_pop[i] = re.findall(r'[^,;\s]+', ind_input)

如果我有2個種羣，其中5列3，圖6是人口1和列2，圖5是人口2.它工作在這種方式：

> How many populations do you have? > 2 
> Type column numbers of population 1 > 3, 5, 6 
> Type column numbers of population 2 > 2, 4

輸入存儲在字典中。

{1: ['3', '5', '6'], 2: ['2', '4']}

問題是如何從這個輸入到定義等位基因。輸出應該是這樣的：

allele1 = [allele[3], allele[5], allele[6]] 
allele2 = [allele[2], allele[4]]

如果有必要在這裏是代碼的其餘部分的主要部分：

with open('test_file.txt') as datafile: 
    next(datafile) 
    for line in datafile: 
    words = line.split() #splits string into the list of words 
    chr_pos = words[0:2] #select column chromosome and position 
    alleles = words[2:] # this and next separates alleles for populations 

    alleles1 = alleles[0:4] 
    alleles2 = alleles[4:8] 
    alleles3 = alleles[8:12] 
    alleles4 = alleles[12:16] 

    counter1=collections.Counter(alleles1) 
    counter2=collections.Counter(alleles1) 
    counter3=collections.Counter(alleles1) 
    counter4=collections.Counter(alleles1) 
#### the rest of the code and some filters within the part above were spiked

來源

2013-04-25 dmkr

首先，您需要的列數轉換爲整數

ind_pop[i] = [int(j) for j in re.findall(r'[^,;\s]+', ind_input)]

（我也想改變你的正則表達式r'\d+'）

然後，而不必alleles1，alleles2等，有一個主列表或詞典：

master = {i: [alleles[j] for j in vals] for i, vals in ind_pop.items()} 
counters = {i: collections.Counter(al) for i, al in master.items()}

然後你就可以訪問counters[i]，而不是counter1等

作爲一個側面說明，你也許可以通過使ind_pop到一個列表，使用append而不是保持計數器

來源

2013-04-25 01:28:23 Felipe

親愛的費利佩，謝謝你r'\ d +' 你是正確的轉換爲整數。我按你的建議做了。主清單或字典確實有一點幫助，但它並不完全符合我的要求。我已經想出瞭解決方案。我會在答案中發佈它。 – dmkr 2013-04-25 21:40:55

如果這是你要找的輸出，

allele1 = [allele[3], allele[5], allele[6]] 
allele2 = [allele[2], allele[4]]

，你有這樣的：

{1: ['3', '5', '6'], 2: ['2', '4']}

這裏很簡單。

for index in population_dict[1]: 
    allele1.append(allele[index]) 
for index in population_dict[2]: 
    allele2.append(allele[index])

哦，如果索引是以字符串形式存儲的，因爲它們看起來像是上面的那樣，您需要首先讓它們整數。你可以將上面的內容更改爲等位基因[int（index）]，但是最好是在閱讀它們時將它們變成整數。

來源

2013-04-25 01:13:52 mattg

問題是'population_dict'可能因用戶輸入而異。我發現解決方案見下面的答案 – dmkr 2013-04-25 21:45:01

謝謝你的建議，簡化上述所有。其中一些是有用的。我覺得我需要改變方向。我將繼續使用列表清單：

pop_alleles = [] 
for key in ind_pop.keys(): 
    pop_alleles.append([alleles[el] for el in ind_pop[key]])

來源

2013-04-25 22:21:24 dmkr

如何鏈接未知大小的兩個依賴輸入與變量

回答

相關問題