2013-09-16 44 views
0

問題涉及'計算行的含義'。代碼塊如下。這段代碼最初是用六個樣本編寫的,我試圖將它縮放到n個樣本。縮放具有不同分片索引的列表解析

每個CSV文件是一個單獨的病人檔案,並在:

| gene | expression | 
| --- | ---  | 
| A1BG | 1.444  | 
| A1CF | 4.303  | 
| A2BP1 | 11.117  | 

原始文件列表中已被更改爲接受的命令行參數的規模,但我不知道在哪裏下一步繼續。我需要抽取每個樣本名稱並在該代碼塊中使用它,同時還要在每個單獨的列表理解中正確遞增切片符號。有任何想法嗎?

import csv 
import matplotlib.pyplot as plt 
import sys 

""" 
This is an implementation of quantile normalization for microarray data analysis. 
""" 

# Parse csv files for samples, creating lists of gene names and expression values. 
#file_list = ['genes1.csv', 'genes2.csv', 'genes3.csv', 'genes4.csv', 'genes5.csv', 
#    'genes6.csv'] 
while True: 
    if (len(sys.argv) > 1): 
     file_list = [args for args in sys.argv[1:]] 
     print file_list 
     break 
    else: 
     print "Not enough arguments given." 
     break 

set_dict = {} 
for path in file_list: 
    with open(path) as stream: 
     data = list(csv.reader(stream, delimiter = '\t')) 
    data = sorted([(i, float(j)) for i, j in data], key = lambda v: v[1]) 
    sample_genes = [i for i, j in data] 
    sample_values = [j for i, j in data] 
    set_dict[path] = (sample_genes, sample_values) 

# Create sorted list of genes and values for all datasets. 
set_list = [x for x in set_dict.items()] 
set_list.sort(key = lambda (x,y): file_list.index(x)) 

這是碼塊需要被縮放以處理任何數量給定爲在CLI參數樣品:

# Compute row means. 
mean_values = [((a + b + c + d + e + f)/len(file_list)) 
       for i, (a, b, c, d, e, f) in 
       enumerate(zip([v for i, (j, k) in set_list[:1] for v in k], 
       [v for i, (j, k) in set_list[1:2] for v in k], 
       [v for i, (j, k) in set_list[2:3] for v in k], 
       [v for i, (j, k) in set_list[3:4] for v in k], 
       [v for i, (j, k) in set_list[4:5] for v in k], 
       [v for i, (j, k) in set_list[5:6] for v in k]))] 

通過@ Bo102010下面給出的校正溶液:

L = len(file_list) 
all_sets = [set_list[i - 1: i] for i in range(1, L + 1)] 
all_values = [[v for i, (j, k) in A for v in k] for A in all_sets] 
mean_values = [sum(p)/L for p in zip(*all_values)] 

回答

1

如果我已經正確理解了你的代碼塊,那麼你應該可以使用「星號」來解開一個迭代器。在撥打電話zip時使用。

L = len(file_list) 
all_sets = [set_list[i - 1: i] for i in range(1, L + 1)] 
all_values = [[v for i, (j, k) in A for v in k] for A in all_sets] 
mean_values = [sum(p)/L for p in zip(*all_values)] 
+0

這給出了錯誤「NameError:name'set_lists'未定義。」 – user2277435

+0

對不起,我在編寫代碼時犯了一個錯誤。我想我現在已經修好了。 – bbayles

+0

這是完美的。 python文檔中是否有鏈接,解釋了關於這個星標的更多內容?謝謝! – user2277435

相關問題