我不認爲我完全理解你的問題。發佈你的代碼和一個示例文件會非常有幫助。
此代碼將計算所有文件中的所有條目,然後它將識別每個文件的唯一條目。之後,它會計算每個文件中每個條目的發生次數。然後,它將只選擇至少出現在所有文件的90%中的條目。
此外,此代碼可能更短,但出於可讀性的原因,我創建了許多變量,使用了很長且有意義的名稱。
請閱讀註釋;)
import os
from collections import Counter
from sys import argv
# adjust your cut point
PERCENT_CUT = 0.9
# here we are going to save each file's entries, so we can sum them later
files_dict = {}
# total files seems to be the number you'll need to check against count
total_files = 0;
# raw total entries, even duplicates
total_entries = 0;
unique_entries = 0;
# first argument is script name, so have the second one be the folder to search
search_dir = argv[1]
# list everything under search dir - ideally only your input files
# CHECK HOW TO READ ONLY SPECIFIC FILE types if you have something inside the same folder
files_list = os.listdir(search_dir)
total_files = len(files_list)
print('Files READ:')
# iterate over each file found at given folder
for file_name in files_list:
print(" "+file_name)
file_object = open(search_dir+file_name, 'r')
# returns a list of entries with 'newline' stripped
file_entries = map(lambda it: it.strip("\r\n"), file_object.readlines())
# gotta count'em all
total_entries += len(file_entries)
# set doesn't allow duplicate entries
entries_set = set(file_entries)
#creates a dict from the set, set each key's value to 1.
file_entries_dict = dict.fromkeys(entries_set, 1)
# entries dict is now used differenty, each key will hold a COUNTER
files_dict[file_name] = Counter(file_entries_dict)
file_object.close();
print("\n\nALL ENTRIES COUNT: "+str(total_entries))
# now we create a dict that will hold each unique key's count so we can sum all dicts read from files
entries_dict = Counter({})
for file_dict_key, file_dict_value in files_dict.items():
print(str(file_dict_key)+" - "+str(file_dict_value))
entries_dict += file_dict_value
print("\nUNIQUE ENTRIES COUNT: "+str(len(entries_dict.keys())))
# print(entries_dict)
# 90% from your question
cut_line = total_files * PERCENT_CUT
print("\nNeeds at least "+str(int(cut_line))+" entries to be listed below")
#output dict is the final dict, where we put entries that were present in > 90% of the files.
output_dict = {}
# this is PYTHON 3 - CHECK YOUR VERSION as older versions might use iteritems() instead of items() in the line belows
for entry, count in entries_dict.items():
if count > cut_line:
output_dict[entry] = count;
print(output_dict)
那麼,你嘗試過什麼?向我們展示您的代碼,告訴我們您卡在哪裏,我們可以提供幫助。 –
你到目前爲止做了什麼?在爲此編碼時,您遇到的具體問題是什麼? – NSNoob
我寫了下面的代碼。我對所需的東西有了一個概念,但我不確定如何使其工作。請幫助我們 count = 0 with open(「expressed.txt」,「w」)as result: with open(「C:/Users/ifeanyi/Desktop/modify/Bmori_id.txt」,「r」)作爲query_file: 在query_file比賽: 在glob.glob名(「* .TXT」): 開放的(名字,「R」)相比: 爲線相比: 如果行匹配: count = + 1 result.append(count) – Mikko