如何在python生成器表達式中包含多個搜索字符串？

我有一個文件列表 - 前兩個文件名相同，但目錄路徑不同。狀態碼（例如CA或OK）也包含在目錄路徑中。如何在python生成器表達式中包含多個搜索字符串？

files = [r'C:\temp\OK\somefile_1234_nw.tif', 
     r'C:\temp\test\CA\somefile_1234_nw.tif', 
     r'C:\temp\OK\somefile_9999_nw.tif']

我可以使用以下生成表達提取與特定文件名的第一個文件：

search_string = 'somefile_1234_nw.tif' 
print next((s for s in files if search_string in s), None)

我如何可以提取包含兩個搜索字符串項文件 - 「CA」和'somefile_1234_nw.tif' - 使用我的生成器表達式？在這種情況下，處理效率很重要，因爲我擴大的問題有數千個項目。

預期的輸出結果是：

'C:\temp\test\CA\somefile_1234_nw.tif'

來源

2015-12-23 Borealis

你的意思是這樣的？

>>> next((s for s in files if all(i in s for i in['somefile_1234_nw.tif', 'CA'])), None) 
'C:\\temp\\test\\CA\\somefile_1234_nw.tif'

all()檢查是否所有的迭代器的元素都是True，如果是這樣，返回True，否則，返回False。

來源

2015-12-23 02:23:35

像這樣的東西應該工作：

search_strings = ['somefile_1234_nw.tif', 'CA'] 
print next((s for s in files if all([search_string in s for search_string in search_strings])), None)

來源

2015-12-23 02:23:50 scope

既然你正在尋找的速度，然後生成可能不是要走的路。發電機很棒，有幾個原因，比如當你將要耗盡內存，或者在你得到下一個答案之前需要做額外的處理。

對於1000個物品甚至數百萬物品的速度：您將需要使用熊貓系列。（因爲1000個項目將適合您的機器上的內存。）

import pandas as pd 
files = pd.Series([r'C:\temp\OK\somefile_1234_nw.tif', 
     r'C:\temp\test\CA\somefile_1234_nw.tif', 
     r'C:\temp\OK\somefile_9999_nw.tif']) 

pattern2 = 'CA' 
pattern1 = 'somefile_1234_nw.tif' 

mask1 = files.str.contains(pattern1) 
files2 = files[mask1] 
mask2 = files2.str.contains(pattern2) 
files2[mask2].values

我希望能夠聽取您的數據的時間安排。

來源

2015-12-23 03:14:48 Back2Basics

如何在python生成器表達式中包含多個搜索字符串？

回答

相關問題