從文件列表中搜索文件

我正在嘗試搜索文件中的文字。這些單詞存儲在一個單獨的列表中。找到的單詞存儲在另一個列表中，最後返回該列表。從文件列表中搜索文件

代碼如下：

def scanEducation(file): 
    education = [] 
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal", 
      "html", "css", "jquery", "linux", "windows"] 
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1: 
    for line in file1: 
     for word in line.split(): 
      matching = [s for s in qualities if word.lower() in s] 
      if matching is not None: 
       education.append(matching) 
return education

首先，它返回我有一堆空「席位」，這意味着我的比較是不工作的清單？

結果（掃描4個文件）：

"C:\Program Files (x86)\Python2\python.exe" C:/Users/Vadim/PycharmProjects/TestFiles/ReadTXT.py 
[[], [], [], [], [], [], [], [], [], ['java', 'javascript']] 
[[], [], [], [], [], [], [], [], [], ['pascal']] 
[[], [], [], [], [], [], [], [], [], ['linux']] 
[[], [], [], [], [], [], [], [], [], [], ['c#']] 

Process finished with exit code 0

輸入文件包含：

Name: Some Name 
Phone: 1234567890 
email: [email protected] 
python,excel,linux

第二期的每個文件containes 3個不同的技能，但功能只發現1或2是這也是一個不好的比較，或者我在這裏有一個不同的錯誤？

我期望的結果是一個沒有空的地方找到的技能列表，並找到文件中的所有技能，而不僅僅是其中的一部分。

編輯：該功能確實發現所有的技能，當我做word.split(', ') 但如果我想它更普遍，這可能是找到這些技能的好辦法，如果我不知道究竟會將它們分開？

來源

2016-09-25 Kiper

如果可以提供輸入文件和預期輸出會有所幫助。 – SilentMonk

編輯。謝謝！ – Kiper

嘗試分割逗號而不是空格。例如，line.split（） - > line.split（「，」） – Checkmate

由於None不等於空白列表，因此會得到空列表。你可能想要的是狀態更改爲以下：

if matching: 
    # do your stuff

看來你要檢查，如果子存在於品質列表中的字符串。這可能不是你想要的。如果您想檢查出現的質量列表上線的話，你可能要改變你的列表中理解到：

words = line.split() 
match = [word for word in words if word.lower() in qualities]

如果你正在尋找到匹配都,和空間，你可能想看看正則表達式。見Split Strings with Multiple Delimiters?。

來源

2016-09-25 07:56:22 krato

謝謝！你的代碼返回了文件中的第一項技能，但其餘部分沒有。 – Kiper

@Kiper我使用'line.split（）'，默認情況下使用空格分隔行，如果你的輸入文件使用逗號，使用'split（'，'）'。也許你必須看看正則表達式，如果你有各種分隔符。 – krato

非常感謝，如果我想結合在那裏的正則表達式。它應該在line.split（here？）裏面，還是應該分開？ – Kiper

的代碼應該寫成如下（如果我理解正確所需的輸出格式）：

def scanEducation(file): 
    education = [] 
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal", 
      "html", "css", "jquery", "linux", "windows"] 
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1: 
    for line in file1: 
     matching = [] 
     for word.lower() in line.strip().split(","): 
      if word in qualities: 
       matching.append(word) 
     if len(matching) != 0: 
      education.append(matching) 
return education

來源

2016-09-25 07:56:43 Checkmate

是「匹配= ..」的行是否正確？我有錯誤使用它 – Kiper

這是我得到的不測試我的代碼XD。這應該工作，對此感到抱歉！ – Checkmate

謝謝，這裏的所有答案都只返回文件的第一個技巧，但不是其他2.我做錯了什麼？ – Kiper

首先，你得到了一堆「空座位」，因爲你的病情沒有被正確定義。如果匹配是一個空列表，它不是無。即：[] is not None評估爲True。這就是爲什麼你得到所有這些「空座位」。

所有的秒數，列表理解中的條件也不是你想要的。除非我在這裏誤解了你的目標，你正在尋找的條件是：

[s for s in qualities if word.lower() == s]

這將檢查質量的列表，並會返回一個列表，是不是空的只有這個詞的一個素質。但是，你因爲這個列表的長度將永遠是1（如果有一個匹配）或0（如果沒有），我們可以用它兌換成布爾Python的內置any()功能：

if any(s == word.lower() for s in qualities): 
    education.append(word)

我希望這會有所幫助，如果您有任何後續問題，請不要猶豫，或者告訴我我是否誤解了您的目標。

爲了您convinevce，這裏是體改源我用來檢查自己：

def scanEducation(file): 
    education = [] 
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal", 
      "html", "css", "jquery", "linux", "windows"] 
    with open(file, 'r') as file1: 
     for line in file1: 
      for word in line.split(): 
       if any(s == word.lower() for s in qualities): 
        education.append(word) 
    return education

來源

2016-09-25 07:59:43 OzTamir

謝謝，使用你的代碼，它給了我每個文件的第一個技能，但沒有其他2個。 – Kiper

您還可以使用正則表達式是這樣的：

def scan_education(file_name): 
    education = [] 
    qualities_list = ["python", "java", "sql", "mysql", "sqlite", "c\#", "c\+\+", "c", "javascript", "pascal", 
         "html", "css", "jquery", "linux", "windows"] 
    qualities = re.compile(r'\b(?:%s)\b' % '|'.join(qualities_list)) 
    for line in open(file_name, 'r'): 
     education += re.findall(qualities, line.lower()) 
    return list(set(education))

來源

2016-09-25 08:29:39 Symonen

下面是使用的一個簡短的例子設置和一些列表理解過濾，以找到文本文件（或者我只用一個文本字符串）和您提供的列表之間的常用詞。這比試圖使用循環更快，更清晰。

import string 

try: 
    with open('myfile.txt') as f: 
     text = f.read() 
except: 
    text = "harry met sally; the boys went to the park. my friend is purple?" 

my_words = set(("harry", "george", "phil", "green", "purple", "blue")) 

text = ''.join(x for x in text if x in string.ascii_letters or x in string.whitespace) 

text = set(text.split()) # split on any whitespace 

common_words = my_words & text # my_words.intersection(text) also does the same 

print common_words

來源

2016-09-25 08:32:27 cacahootie

從文件列表中搜索文件

回答

相關問題