在Python中比較兩個座標列表並使用座標值分配值

我有兩組數據取自兩個單獨的導入文件，這兩個導入文件都被導入到python中，並且當前已放置在列表中，如下所示。在Python中比較兩個座標列表並使用座標值分配值

列表1是在以下形式：

（附圖標記中，x座標，y座標）

實施例列表1：[[1，0,0]，[2,0，10 ]，[3,0，20]，[4,0，30]，[5,0，40]]

列表2是形式爲：

（x座標，Y座標，溫度）

示例列表2：[[0,0,100]，[0,10,110]，[0,20,120]，[0,30,130]，[0,40,140]]

我需要使用x和y座標來比較兩個列表，如果他們發現匹配產生包含相應參考數字和溫度的新列表。

例如從輸出列表上方的兩個列表將遵循以下形式：

（參考號，溫度）

示例輸出列表：[[1，100]，[2，110 ]，[3,120]，[4,130]，[5,140]]

這是要做大量的數據，我真的很努力地找到一個解決方案，任何幫助將真的很感激。歡呼聲

來源

2015-02-08 Monkey Bath

定義「大量數據」 - 您是否需要多臺機器？或者只是尋找在單臺機器上運行的相對高效的解決方案？換句話說，它是否是大數據問題？ – amit 2015-02-08 10:51:40

這可以通過嵌套for循環來完成。你還可以分享你迄今爲止所嘗試過的嗎？ – 2015-02-08 10:55:07

這個作品0(n^2)但它很容易閱讀和理解。

result = [] 
for reference, x, y in list1: 
    for a, b, temperature in list2: 
     if x == a and y == b: 
      result.append([temperature, reference])

您可以通過遍歷列表中的複雜性降低到0(n)，存放在dict座標如下：

dict1 = {} 
for reference, x, y in list1: 
    dict[(x, y)] = reference 

dict2 = {} 
for x, y, temperature in list2: 
    dict2[(x, y)] = temperature 

result = [] 
for coordinate, reference in dict1.iteritems(): 
    temperature = dict2.get(coordinate) 
    if temperature: 
     result.append([temperature, reference])

來源

2015-02-08 10:57:50 ozgur

不是我，但你的舊回答非常低效，OP明確提到大尺寸，因此效率是一個問題;你編輯的似乎是一個現有的（我的）答案的實現。 – amit 2015-02-08 11:18:10

您可以使用map-reduce完成此任務。

僞代碼：

map1(list): #runs on first file 
    for each (i,x,y) in list: 
    emit ((x,y),(1,i)) 
map2(list): #runs on 2nd file 
    for each (x,y,temp) in list: 
    emit ((x,y),(2,temp)) 
reduce((x,y),list): #runs on output of both mappers 
    for each (aux, val) in list: 
     if aux == 1: 
      i = val 
     else: 
      temp = val 
    if both i and temp initialized: 
     emit(i,temp)

的map-reduce是一個框架，讓您輕鬆實現大數據的問題，如果將其建模成一系列的map-reduce任務，上面的僞代碼解釋了什麼是可能的map-reduce的步驟可能是。

這種方法很容易處理海量數據（包括peta尺度），並讓框架爲您做骯髒的工作。

的想法是首先每個文件映射到某種哈希表（這是由框架內部完成的），你有兩個哈希表：

鍵=（X ，y）的值= ID
密鑰=（X，Y）值=溫度圖

一旦你有兩個哈希表，它是很容易找到哪個ID被連接到溫度圖在單次通過，並一旦連接完成--outp呃。

此代碼的複雜性爲O(n)平均情況。

需要注意的是，如果你的座標不是整數（但使用浮點） - 你將需要使用一些基於樹的地圖，而不是一個哈希表，比較鍵時一定要非常小心 - 由於浮點算術的本質。
處理整數時這不應該是個問題。

來源

2015-02-08 10:55:17 amit

請解釋一下map1（）和map2（）代表什麼，我很感興趣。另外，爲什麼「：」在reduce（）之後。 – user3699166 2015-02-08 11:07:48

@ user3699166 map1，map2，reduce是提供給map-reduce框架的所有函數。 'map1'解析第一個文件並創建一個哈希表[dictionary]'（（x，y） - > id）'，同樣'map2'創建一個字典'（（x，y） - > temprature）。 'reduce'結合了兩個散列表。請注意，這裏沒有顯式的哈希表，因爲它是由map-reduce框架實現的。 – amit 2015-02-08 11:21:08

冒着出現啞巴的風險，我不得不說，我仍然沒有得到這個觀點，對不起:(你在說哪個框架？你是指一個模塊嗎？或者你之前是否自己定義過這些功能？ – user3699166 2015-02-08 11:25:10

lst1 = [[1, 0, 0], [2, 0, 10], [3, 0, 20], [4, 0, 30], [5, 0, 40]] 
lst2 = [[0, 0, 100], [0, 10, 110], [0, 20, 120], [0, 30, 130], [0, 40, 140]] 
dict1 = {(x, y): ref for ref, x, y in lst1} 
dict2 = {(x, y): temp for x, y, temp in lst2} 
matchxy = set(dict1) & set(dict2) 
lstout = sorted([dict1[xy], dict2[xy]] for xy in matchxy) 
print(lstout)

這給出了

[[1, 100], [2, 110], [3, 120], [4, 130], [5, 140]]

所需的輸出

我使用集合來查找公共點。

來源

2015-02-08 14:20:12 Paddy3118

您可以構造sqlite數據庫表並查詢它們以獲取所需結果。

import sqlite3, operator 

reference = [[1, 0, 0], [2, 0, 10], [3, 0, 20], [4, 0, 30], [5, 0, 40]] 
temperature = [[0, 0, 100], [0, 10, 110], [0, 20, 120], [0, 30, 130], [0, 40, 140]]

一對幫助者 - 我喜歡使用它們，因爲它使後續的代碼可讀。

reference_coord = operator.itemgetter(1,2) 
ref = operator.itemgetter(0) 
temperature_coord = operator.itemgetter(0,1) 
temp = operator.itemgetter(2)

創建一個數據庫（在內存中）

con = sqlite3.connect(":memory:")

兩種方法可以解決這個，保存在單獨的表中的所有信息，或者建立一個單一的表，只有數據你想

每個清單一張表

con.execute("create table reference(coordinate TEXT PRIMARY KEY, reference INTEGER)") 
con.execute("create table temperature(coordinate TEXT PRIMARY KEY, temperature INTEGER)") 

# fill the tables 
parameters = [(str(reference_coord(item)), ref(item)) for item in reference] 
con.executemany("INSERT INTO reference(coordinate, reference) VALUES (?, ?)", parameters) 
parameters = [(str(temperature_coord(item)), temp(item)) for item in temperature] 
con.executemany("INSERT INTO temperature(coordinate, temperature) VALUES (?, ?)", parameters)

查詢的數據的兩個表需要

cursor = con.execute('SELECT reference.reference, temperature.temperature FROM reference, temperature WHERE reference.coordinate = temperature.coordinate') 
print(cursor.fetchall())

表，結合了數據在兩個名單

con.execute("create table data(coordinate TEXT PRIMARY KEY, reference INTEGER, temperature INTEGER)")

建設只用數據你所關心的表約

parameters = [(str(reference_coord(item)), ref(item)) for item in reference] 
con.executemany("INSERT INTO data(coordinate, reference) VALUES (?, ?)", parameters) 
parameters = [(temp(item), str(temperature_coord(item))) for item in temperature] 
con.executemany("UPDATE data SET temperature=? WHERE coordinate=?", parameters)

簡單的查詢，因爲表只有你想要的

cursor2 = con.execute('SELECT reference, temperature FROM data') 
print(cursor2.fetchall()) 

con.close()

結果：

>>> 
[(1, 100), (2, 110), (3, 120), (4, 130), (5, 140)] 
[(1, 100), (2, 110), (3, 120), (4, 130), (5, 140)] 
>>>

一旦你的數據到數據庫中是相當容易從中提取信息，如果一個文件DB來代替一個數據庫的數據庫可以持久記憶。

如果外部庫可以接受，pandas具有類似的功能，是一個很棒的軟件包。

來源

2015-02-08 21:35:58 wwii

在Python中比較兩個座標列表並使用座標值分配值

回答

相關問題