最快的方式包含楠

我也有同樣的問題，Remove a tuple containing nan in list of tuples -- Python解釋：最快的方式包含楠

我有兩個列表具有形狀3000，說：

a = list(range(3000)) 
b = list(range(3000))

和一些元素是不同的種的NaN，一些元素是字符串，他們大多是整型和浮點，說：

a[0] = np.nan 
b[1] = 'hello' 
a[2] = 2.0 
b[3] = float('nan')

，然後我需要在一起壓縮它們並刪除包含楠元組，和我這樣做：

merge = zip(a, b) 
c = [x for x in merge if not any(isinstance(i, float) and np.isnan(i) for i in x)]

但性能不是很好，它需要太多的時間，因爲我需要做很多檢查。

當我運行它1000次大約需要2.2秒。

然後我試圖做到這一點：

c = [x for x in merge if all(i == i for i in x)]

當我運行1000次需要約110秒。

我想知道是否有更快的方法去除包含NaN的元組？請注意，元組中有多個NaN。

來源

2017-03-27 Dirk Paul

怎麼樣：'[x for x in merge if not np.nan in x]'？ – Kasramvd

如果不是x中的np.nan無法檢測到float（'nan'） –

'nans = {np.nan，float（'nan'）}; [x for merge in if not nans.intersection（x）]' – Kasramvd

您可以將nan置於一個集合中，並使用元組檢查交集。您可與列表理解或itertools.filterfalse做到這一點：

In [17]: a = range(3000) 

In [18]: merge = list(zip(a, a)) 

In [19]: %timeit [x for x in merge if not nans.intersection(x)] 
1000 loops, best of 3: 566 us per loop 

In [20]: %timeit [x for x in merge if all(i == i for i in x)] 
1000 loops, best of 3: 1.13 ms per loop 

In [21]: %timeit list(filterfalse(nans.intersection, merge)) 
1000 loops, best of 3: 402 us per loop

使用filterfalse最後一種方法是快約3倍。

來源

2017-03-27 05:41:13 Kasramvd

最快的方式包含楠

回答

相關問題