0
問題涉及'計算行的含義'。代碼塊如下。這段代碼最初是用六個樣本編寫的,我試圖將它縮放到n個樣本。縮放具有不同分片索引的列表解析
每個CSV文件是一個單獨的病人檔案,並在:
| gene | expression |
| --- | --- |
| A1BG | 1.444 |
| A1CF | 4.303 |
| A2BP1 | 11.117 |
原始文件列表中已被更改爲接受的命令行參數的規模,但我不知道在哪裏下一步繼續。我需要抽取每個樣本名稱並在該代碼塊中使用它,同時還要在每個單獨的列表理解中正確遞增切片符號。有任何想法嗎?
import csv
import matplotlib.pyplot as plt
import sys
"""
This is an implementation of quantile normalization for microarray data analysis.
"""
# Parse csv files for samples, creating lists of gene names and expression values.
#file_list = ['genes1.csv', 'genes2.csv', 'genes3.csv', 'genes4.csv', 'genes5.csv',
# 'genes6.csv']
while True:
if (len(sys.argv) > 1):
file_list = [args for args in sys.argv[1:]]
print file_list
break
else:
print "Not enough arguments given."
break
set_dict = {}
for path in file_list:
with open(path) as stream:
data = list(csv.reader(stream, delimiter = '\t'))
data = sorted([(i, float(j)) for i, j in data], key = lambda v: v[1])
sample_genes = [i for i, j in data]
sample_values = [j for i, j in data]
set_dict[path] = (sample_genes, sample_values)
# Create sorted list of genes and values for all datasets.
set_list = [x for x in set_dict.items()]
set_list.sort(key = lambda (x,y): file_list.index(x))
這是碼塊需要被縮放以處理任何數量給定爲在CLI參數樣品:
# Compute row means.
mean_values = [((a + b + c + d + e + f)/len(file_list))
for i, (a, b, c, d, e, f) in
enumerate(zip([v for i, (j, k) in set_list[:1] for v in k],
[v for i, (j, k) in set_list[1:2] for v in k],
[v for i, (j, k) in set_list[2:3] for v in k],
[v for i, (j, k) in set_list[3:4] for v in k],
[v for i, (j, k) in set_list[4:5] for v in k],
[v for i, (j, k) in set_list[5:6] for v in k]))]
通過@ Bo102010下面給出的校正溶液:
L = len(file_list)
all_sets = [set_list[i - 1: i] for i in range(1, L + 1)]
all_values = [[v for i, (j, k) in A for v in k] for A in all_sets]
mean_values = [sum(p)/L for p in zip(*all_values)]
這給出了錯誤「NameError:name'set_lists'未定義。」 – user2277435
對不起,我在編寫代碼時犯了一個錯誤。我想我現在已經修好了。 – bbayles
這是完美的。 python文檔中是否有鏈接,解釋了關於這個星標的更多內容?謝謝! – user2277435