我的下面的代碼當前正在檢查一個文本文件,以查看它是否可以從我的詞典文件中找到一個句子中的單詞,如果它找到一個單詞,看看是否可以從一個輔助列表中找到一個單詞,如果這兩個條件在一行中被滿足,那麼這行就被打印出來。Python3爲語句中的列表中的特定輸出單詞添加顏色
我想要做的是設置字典詞的顏色爲例如紅色&藍色的詞語在次要名單中被稱爲CategoryGA,我的目的是爲了在打印輸出中輕鬆識別每一個發現的話來自於。
import re
import collections
from collections import defaultdict
from collections import Counter
import sys
from Categories.GainingAccess import GA
Chatpath = "########/Chat1.txt"
Chatfile = Chatpath
lpath = 'Lexicons/######.txt'
lfile = lpath
CategoryGA = GA
Hits = []
"""
text_file = open(path, "r")
lines = text_file.read().split()
c = Counter(lines)
for i, j in c.most_common(50):
print(i, j)
"""
# class LanguageModelling:
def readfile():
Word_Hit = None
with open(Chatfile) as file_read:
content = file_read.readlines()
for line_num, line in enumerate(content):
if any(word in line for word in CategoryGA):
Word_Hit = False
for word in CategoryGA:
if line.find(word) != -1:
Word_Hit = True
Hits.append(word)
Cleanse = re.sub('<.*?>', '', line)
print('%s appeared on Line %d : %s' % (word, line_num, Cleanse))
file_read.close()
count = Counter(Hits)
count.keys()
for key, value in count.items():
print(key, ':', value)
def readlex():
with open(lfile) as l_read:
l_content = l_read.readlines()
for line in l_content:
r = re.compile(r'^\d+\s+\d+\.\d+%\s*')
l_Cleanse = r.sub('', line)
print(l_Cleanse)
l_read.close()
def LanguageDetect():
with open(Chatfile) as c_read, open(lfile) as l_read:
c_content = c_read.readlines()
lex_content = l_read.readlines()
for line in c_content:
Cleanse = re.sub('<.*?>', '', line)
if any(lex_word in line for lex_word in lex_content) \
and \
any(cat_word in line for cat_word in CategoryGA):
lex_word = '\033[1;31m{}\033[1;m'.format(lex_word)
cat_word = '\033[1;44m{}\033[1;m'.format(cat_word)
print(Cleanse)
# print(cat_word)
c_read.close()
l_read.close()
#readfile()
LanguageDetect()
# readlex()
這是我的全部代碼,但問題是在「LanguageDetect」的方法我目前通過指定lex_word & cat_word變量試圖辦法沒有奏效發生和坦白說,我很爲難,以什麼嘗試下一個。
詞彙:
31547 4.7072% i
25109 3.7466% u
20275 3.0253% you
10992 1.6401% me
9490 1.4160% do
7681 1.1461% like
6293 0.9390% want
6225 0.9288% my
5459 0.8145% have
5141 0.7671% your
5103 0.7614% lol
4857 0.7247% can
那麼readlex方法中我使用:
r = re.compile(r'^\d+\s+\d+\.\d+%\s*')
l_Cleanse = r.sub('', line)
的字/字符我認爲這可能是最主要的問題,爲什麼我不能之前刪除一切不要着色詞彙詞,但不確定如何解決這個問題。
顏色解釋取決於你所運行的終端上。你確定你的終端可以處理顏色嗎? –
我只是通過pycharm目前輸出,所以應該能夠處理的顏色,我相信目前的問題更多的是代碼比終端 –
'lex_word =''\ 033 [1; 31m'+ lex_word +'\ 033 [1; m''爲我工作,爲什麼我要求 –