Python的邏輯在檢索串

filtered=[] 
text="any.pdf" 
if "doc" and "pdf" and "xls" and "jpg" not in text: 
    filtered.append(text) 
print(filtered)

這是我在堆棧溢出的第一篇文章，所以藉口如果有什麼令人討厭的問題，該代碼假設要追加文本，如果文本不包括任何的這些話：DOC，PDF， XLS，JPG。它工作正常，如果它像：Python的邏輯在檢索串

if "doc" in text: 
elif "jpg" in text: 
elif "pdf" in text: 
elif "xls" in text: 
else: 
    filtered.append(text)

來源

2011-02-27 Mahmoud A. Raouf

如果打開了Python解釋器，你會發現，"doc" and "pdf" and "xls" and "jpg"是同樣的事情'jpg'：

>>> "doc" and "pdf" and "xls" and "jpg" 
'jpg'

因此，而不是測試對所有的字符串，你的第一次嘗試只對'jpg'進行測試。

有很多方法可以做你想做的。下面的是不是最明顯的，但它是有用的：

if not any(test_string in text for test_string in ["doc", "pdf", "xls", "jpg"]): 
    filtered.append(text)

另一種方法是結合使用for環路以else聲明：

for test_string in ["doc", "pdf", "xls", "jpg"]: 
    if test_string in text: 
     break 
else: 
    filtered.append(text)

最後，你可以使用純列表理解：

tofilter = ["one.pdf", "two.txt", "three.jpg", "four.png"] 
test_strings = ["doc", "pdf", "xls", "jpg"] 
filtered = [s for s in tofilter if not any(t in s for t in test_strings)]

編輯：

如果要篩選這兩個詞和擴展，我提出以下建議：

text_list = generate_text_list() # or whatever you do to get a text sequence 
extensions = ['.doc', '.pdf', '.xls', '.jpg'] 
words = ['some', 'words', 'to', 'filter'] 
text_list = [text for text in text_list if not text.endswith(tuple(extensions))] 
text_list = [text for text in text_list if not any(word in text for word in words)]

這可能仍然導致一些不匹配;上面還會過濾「做某事」，「他是個單詞」等。如果這是一個問題，那麼您可能需要更復雜的解決方案。

來源

2011-02-27 07:31:28 senderle

而不是編輯我會簡單地補充一點，如果你想忽略大小寫，你應該使用'str.lower（）'方法 - 即text.lower（）'中的''pdf'「。另外，使用'.endswith（）'（S。Mark的回答）很好，因爲它不會拒絕像「mypdfprocessor.py」這樣的字符串。 – senderle 2011-02-27 17:47:30

如果這些擴展總是在最後，你可以使用.endswith，並且可以解析元組。

if not text.endswith(("doc", "pdf", "xls", "jpg")): 
    filtered.append(text)

來源

2011-02-27 07:28:26 YOU

只需編輯'如果not'爲代碼排除與這些字符串結束鏈接，對不起，我不能，因爲它告訴我這是少於6個字符編輯它自己，謝謝 – 2011-02-27 08:15:57

@Mahmoud，謝謝，我完全錯過了你的**不**，更新 – YOU 2011-02-27 08:17:35

+1，endswith絕對是專門基於擴展名進行過濾的方式。 – senderle 2011-02-27 17:51:22

嘗試以下操作：

if all(substring not in text for substring in ['doc', 'pdf', 'xls', 'jpg']): 
    filtered.append(text)

來源

2011-02-27 07:28:32

basename, ext = os.path.splitext(some_filename) 
if not ext in ('.pdf', '.png'): 
    filtered.append(some_filename) 
....

來源

2011-02-27 07:33:32

當前選擇的答案是很不錯的，只要解釋語法正確的方式做你想要向做什麼。然而很明顯，你正在處理文件擴展名，它出現在末尾 [失敗：doctor_no.py,whatsupdoc]，並且很可能您正在使用Windows，其中文件路徑的案例區別不存在[失敗：FUBAR.DOC]。

爲了彌補這些基地：

# setup 
import os.path 
interesting_extensions = set("." + x for x in "doc pdf xls jpg".split()) 

# each time around 
basename, ext = os.path.splitext(text) 
if ext.lower() not in interesting_extensions: 
    filtered.append(text)

來源

2011-02-27 10:17:52

對不起，我沒有聽到你在說什麼，但我使用的是Ubuntu，主要目標是Spidering網站，從源代碼中提取源代碼後，我排除了包含javascript或這些詞的鏈接，因此無論如何都是 – 2011-02-27 21:36:58

排除包含**字符串**的鏈接，而不包含包含這些**字詞**的鏈接。您將（例如）排除包含單詞「doctor」或「dock」或「docket」或「doctored」的鏈接，並且無法排除包含大寫的文件名的鏈接（例如：FUBAR.DOC）。 – 2011-02-27 23:05:43

我正在使用'.lower（）'，所以FUBAR.DOC將不會被包含，但你是對的，所有的單詞都將被排除，我不想這樣做。問題不是所有的單詞都是擴展名，比如javascript在開始，所以該怎麼辦？ – 2011-02-28 00:59:50

Python的邏輯在檢索串

回答

相關問題