防止列表迭代中的多個匹配

我對python比較新，所以我會盡我所能解釋我正在嘗試做什麼。我試圖遍歷兩個恆星列表（它們都包含記錄數組），試圖通過它們的座標與容差匹配（在這種情況下，Ra和Dec，這兩個記錄數組中的索引）。然而，似乎有一個列表中有多個明星與另一個明星相匹配。 *這是由於這兩個星星在配對內匹配。有沒有辦法來防止這種情況？這是我到目前爲止：防止列表迭代中的多個匹配

from __future__ import print_function 
import numpy as np  

###importing data### 
Astars = list() 
for s in ApStars:###this is imported but not shown 
    Astars.append(s) 

wStars = list() 
data1 = np.genfromtxt('6819.txt', dtype=None, delimiter=',', names=True) 
for star in data1: 
    wStars.append(star) 

###beginning matching stars between the Astars and wStars### 
list1 = list() 
list2 = list() 
for star,s in [(star,s) for star in wStars for s in Astars]: 
    if np.logical_and(np.isclose(s["RA"],star["RA"], atol=0.000277778)==True , 
         np.isclose(s["DEC"],star["DEC"],atol=0.000277778)==True): 
     if star not in list1: 
      list1.append(star) #matched wStars 
     if s not in list2: 
      list2.append(s) #matched Astars

我不能減少atol，因爲它超出了儀器的誤差。會發生什麼是：有多個Wstars匹配一個Astar。如果可能的話，我只想要一顆明星。

有什麼建議嗎？

來源

2016-07-29 Thomas Grier

題外話題，但你的「繼續」陳述不做任何事 –

你能澄清/修改你的代碼嗎？你正在初始化'list1'和'list2'，但是這兩個都是你的'for star，s ...'循環時的空列表。 –

可能是我的編輯... @Thomas_Grier，你可以縮進「continue」語句到他們正確的位置嗎？（最後一個'continue'是多餘的，但是你可能需要第一個與if語句相同的水平。 – Alexander

我很欣賞你所提供的所有！我也做了一些問詢，並找到了一個聰明的方法來完成我正在尋找的東西。下面是我們想出了：

sharedStarsW = list() 
sharedStarsA = list() 
for s in Astars: 
    distance = [((s["RA"] - star["RA"])**2 + (s["DEC"] - star["DEC"])**2)**0.5 for star in wStars] 
    if np.amin(distance) < 0.00009259266: 
     sharedStarsW.append(wStars[(np.argmin(distance))]) 
     sharedStarsA.append(s)

使用列表理解，這算從愛仕達的距離，所有wStars，並採取所有的人都屬於1/3角秒之內。如果一個Astar明星有多個wStars匹配，它會附加給出最短距離的Wstar索引和它的Astar。

來源

2016-07-30 02:11:29

第一次在這裏回答一個問題（如果我犯了一個錯誤，請指出）。但似乎大衛所評論的是正確的，「明星總是在list1中（並且s始終在list2中」）所以我建議比較和附加到一個newlist1/newlist1，保持田徑明星和s。

newlist1 = list() 
newlist2 = list() 

#new list will keep the unique star and s 
for star in list1: 
    for s in list2: 
     #assuming the comparison works haven't test it yet 
     if np.logical_and(np.isclose(s["RA"],star["RA"], atol=0.000277778)==True , np.isclose(s["DEC"],star["DEC"],atol=0.000277778)==True): 
       if star not in newlist1: 
        newlist1.append(s) 
       if s not in newlist2: 
        newlist2.append(s) 
       break 
       #once a match is found leave the second loop

來源

2016-07-29 16:53:51

我不相信這會適用於什麼我這樣做，因爲每個數據集都有獨特的數據，我最終會將它們結合起來（這就是爲什麼我需要數量的恆星）。在這兩個列表中追加's'只會告訴我哪些星星匹配以及哪些星星匹配沒有， –

啊，道歉我誤解了你想要的結果，我會盡力在這個網站的未來答案中更加小心， –

沒問題，謝謝幫助！ –

我會改變你的方法完全適合的事實，這些都是你所談論的天體。我會忽略加載功能，並假設您已經有您的輸入列表Astar和wStar。

我們會發現最接近明星wStar使用笛卡兒點積在Astar中的每個明星。那應該幫助解決有關最佳匹配的任何含糊之處。

# Pre-process the data a little 
def getCV(ra, de): 
    return np.array([np.cos(aStar['DE']) * np.cos(aStar['RA']), 
        np.cos(aStar['DE']) * np.sin(aStar['RA']), 
        np.sin(aStar['DE'])]) 

for aStar in Astars: 
    aStar['CV'] = getCV(aStar['RA'], aStar['DE']) 
for wStar in wStars: 
    wStar['CV'] = getCV(wStar['RA'], wStar['DE']) 

# Construct lists of matching stars 
aList = [] 
wList = [] 

# This an extra list of lists of stars that are within tolerance but are 
# not best matches. This list will contain empty sublists, but never None 
wCandidates [] 

for aStar in AStars: 
    for wStar in wStars: 
     # Use native short-circuiting, and don't explicitly test for `True` 
     if np.isclose(aStar["RA"], wStar["RA"], atol=0.000277778) and \ 
      np.isclose(aStar["DEC"], wStar["DEC"], atol=0.000277778): 
      newDot = np.dot(aStar['CV'], wStar['CV']) 
      if aStar == aList[-1]: 
       # This star already has a match, possibly update it 
       if newDot > bestDot: 
        bestDot = newDot 
        # Move the previous best match to list of candidates 
        wCandidates[-1].append(wList[-1]) 
        wList[-1] = wStar 
       else: 
        wCandidates[-1].append(wStar) 
      else: 
       # This star does not yet have a match 
       bestDot = newDot 
       aList.append(aStar) 
       wList.append(wStar) 
       wCandidates.append([])

其結果是，在wList每個索引星星代表在aList相應星級的最佳匹配。並非所有的明星都有匹配，所以不是所有明星都會出現在任何一個列表中。請注意，在aList中的某個明星與wList中的明星不是最匹配的情況下，可能會有一些（非常不可能）的情況。

我們通過基於these formulas計算笛卡爾單位向量並獲取點積來找到兩顆星之間最接近的絕對距離。點越接近一個，星星越接近。這應該有助於解決歧義。

我預先計算了主循環外星星的笛卡爾向量，以避免重複執行wStars。密鑰名稱'CV'代表笛卡爾矢量。改變它，只要你認爲合適。

最後，請注意，此方法不檢查wStars中的星號是否與多個AStar匹配。它只是確保爲每個AStar選擇最佳wStar。

UPDATE

我增加了第三個列表的輸出，其中列出了所有的wStars候選分別對應AStars元素的公差範圍內，但沒有得到選爲最佳匹配。

來源

2016-07-29 17:24:30

我開始實現你的解決方案來解決我的問題，但是由於我正在處理的數據表的性質，它變得相當複雜（這意味着我將不得不計算這些Cartesian點積，將此列合併到數據集記錄數組，並通過您提供的其餘代碼。經另一名學生檢查後，他建議按照我在下面發佈的內容做答案。我非常感謝你通過這個過程的幫助！ –

我還沒有完全理解您的問題，但我會先嚐試簡化您的計算。

看起來像Apstars和data1是結構化數組，都與1D相同dtype。

此列表迭代可以替換爲：

Astars = list() 
for s in ApStars:###this is imported but not shown 
    Astars.append(s)

與

Astarrs = list(ApStars)

或只是省略。如果你可以在這裏迭代ApStars，你可以在列表理解中迭代它們。相同的wStars。

我已經重寫的比較爲：

set1, set2 = set(), set() 
# for star,s in [(star,s) for star in data1 for s in ApStars]: 
# iteration on this list comprehension works, 
# but I think this nest iteration is clearer 
for star in data1: 
    for s in ApStars: 
     x1 = np.isclose(s["RA"],star["RA"], atol=0.000277778) 
     x2 = np.isclose(s["DEC"],star["DEC"],atol=0.000277778) 
     # isclose returns boolean, don't need the ==True 
     if x1 & x2: 
      set1.add(star) 
      set2.add(s)

添加無需更換很容易與set，雖然順序沒有定義（同與字典）。

我想探討在迭代之前'提取'相關字段是否有幫助。

Apstars['RA'], data1['RA'], Apstars['DEC'], data1['DEC'] 

x1 = np.isclose(Apstars['RA'][:,None], data1['RA'], atol=...) 
x2 = np.isclose(Apstars['DEC']....) 

x12 = x1 & x2

x12是一個二維布爾數組;當Apstars[i]與data1[j]「接近」時，x12[i,j]爲真。

來源

2016-07-29 18:36:27 hpaulj

防止列表迭代中的多個匹配

回答

相關問題