我有幾個製表符分隔的文件,我想使用csvDictreader讀入到dicts中。在開始實際數據之前,每個文件都包含以'#'或'\ t'開頭的幾條註釋行。註釋行的數量因文件而異。我一直在嘗試this post中列出的方法,但似乎無法使其工作。在csv.Dict讀取器中跳過不同類型的註釋行
這裏是我當前的代碼:
def load_database_snps(inputFile):
'''This function takes a txt tab delimited input file (in house database) and returns a list of dictionaries for each variant'''
idStore = [] #empty list for storing variant records
with open(inputFile, 'r+') as varin:
idStoreDictgroup = csv.DictReader((row for row in varin if row.startswith('hr', 1, 2)),delimiter='\t') #create a generator; dictionary per snp (row) in the file
idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names
print(type(idStoreDictgroup))
for d in idStoreDictgroup: #iterate over dictionaries in varin_dictgroup
print(d)
idStore.append(d) #attach to var_list
return idStore
下面是一個輸入文件的例子:
## SM=Sample,AD=Total Allele Depth, DP=Total Depth
## het;;; and homo;;; are breakdowns of variant read counts per sample - chr1:10002921 T>G AD=34 het:4;11;7;12 (sum=34)
Hetereozygous Homozygous
Chr Start End ref |A| |C| |G| |T| HetCount |A| |C| |G| |T| HomCount TotalCount SampleCount
chr1 10001102 10001102 T 0 0 SM=1;AD=22;DP=38 0 1 0 0 0 0 0 1 138 het:22; homo:-
chr1 10002921 10002921 T 0 0 SM=4;AD=34;DP=63 0 4 0 0 0 0 0 4 138 het:4;11;7;12; homo:-
我想所有人閱讀該行以「人權委員會」或「CHR」 。我認爲它不起作用,因爲我需要遍歷它來重新格式化字段名稱,使用生成器在將行讀取到字典之前耗盡它。
該錯誤消息我得到的是:
Traceback (most recent call last): File "snp_freq_V1-1_export.py", line 99, in <module> snp_check_wrapper(inputargs.snpstocheck, inputargs.snp_database_location) File "snp_freq_V1-1_export.py", line 92, in snp_check_wrapper snpDatabase = load_database_snps(databaseInputFile) #store database variants in snp_database (a dictionary) File "snp_freq_V1-1_export.py", line 53, in load_database_snps idStoreDictgroup.fieldnames = [field.strip() for field in idStoreDictgroup.fieldnames] #strip whitespace from field names TypeError: 'NoneType' object is not iterable
我試圖做的我當前的代碼逆並明確排除以「#」和「\ T」行。但是這也不起作用,只是給了我一個空白字典。
有每個文件只有一個?例如...上面的評論/標題不會重複每個文件一次以上? –
是的,所以從示例文件中,我希望它使用Chr Start ...行作爲標題和所有後續行作爲我的詞典的值。 –