2010-04-09 82 views
0

我想合併數據。以下是我的MySQL表。我想用Python遍歷這兩個列表(其中一個用dupe ='x',另一個用null dupes)。如何比較2個列表並將它們合併到Python/MySQL中?

這是示例數據。實際數據是巨大的。

例如:

a b c d e f key dupe 
-------------------- 
1 d c f k l 1 x 
2 g h j 1  
3 i h u u 2 
4 u r  t 2 x 

從上面的示例表,所需的輸出是:

a b c d e f key dupe 
-------------------- 
2 g c h k j 1 
3 i r h u u 2 

我到目前爲止有:

import string, os, sys 
import MySQLdb 
from EncryptedFile import EncryptedFile 

enc = EncryptedFile(os.getenv("HOME") + '/.py-encrypted-file') 
user = enc.getValue("user") 
pw = enc.getValue("pw") 

db = MySQLdb.connect(host="127.0.0.1", user=user, passwd=pw,db=user) 

cursor = db.cursor() 
cursor2 = db.cursor() 

cursor.execute("select * from delThisTable where dupe is null") 
cursor2.execute("select * from delThisTable where dupe is not null") 
result = cursor.fetchall() 
result2 = cursor2.fetchall() 

for each record 
    for each field 
     perform the comparison and perform the necessary updates 

      ### How do I compare the record with same key value and update the original row null field value with the non-null value from the duplicate? Please fill this void... 


cursor.close() 
cursor2.close() 
db.close() 

謝謝你們!

+0

想不通的問題。你想獲得algorythm,還是根據具體框架來實現? 事實上,你不需要遍歷遊標和'coalesce'項的字段。 在這種情況下你可以執行普通的SQL嗎?如果可以,原因是查詢很簡單。 – 2010-04-09 20:26:21

+0

這是簡單,簡單的測試數據。實際上,有幾千行和幾百列,因此這種方法。謝謝。 – ThinkCode 2010-04-09 20:33:14

+0

update delthistable t set ta = coalesce(dup.a,ta),tb = coalesce(dup.b,tb)... from(select * from delthistable where dupe ='x')dup 其中t.dupe <>'x'and t.key = dup.key ------------------------------------ -------------------------- 從delthistable刪除其中dupe <>'x' – 2010-04-09 20:51:51

回答

2

OK,讓我們有一些有趣的...

mysql> create table so (a int, b char, c char, d char, e char, f char, `key` int, dupe char); 
Query OK, 0 rows affected (0.05 sec) 

mysql> insert into so values (1, 'd', 'c', 'f', 'k', 'l', 1, 'x'), (2, 'g', null, 'h', null, 'j', 1, null), (3, 'i', null, 'h', 'u', 'u', 2, null), (4, 'u', 'r', null, null, 't', 2, 'x'); 
Query OK, 4 rows affected (0.00 sec) 
Records: 4 Duplicates: 0 Warnings: 0 

mysql> select * from so order by a; 
+------+------+------+------+------+------+------+------+ 
| a | b | c | d | e | f | key | dupe | 
+------+------+------+------+------+------+------+------+ 
| 1 | d | c | f | k | l | 1 | x | 
| 2 | g | NULL | h | NULL | j | 1 | NULL | 
| 3 | i | NULL | h | u | u | 2 | NULL | 
| 4 | u | r | NULL | NULL | t | 2 | x | 
+------+------+------+------+------+------+------+------+ 
4 rows in set (0.00 sec) 

Python 2.6.5 (r265:79063, Mar 26 2010, 22:43:05) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import MySQLdb 
>>> db = MySQLdb.connect(host="127.0.0.1", db="test") 
>>> c = db.cursor() 
>>> c.execute("SELECT a, b, c, d, e, f, `key`, dupe FROM so") 
4L 
>>> rows = c.fetchall() 
>>> rows 
((1L, 'd', 'c', 'f', 'k', 'l', 1L, 'x'), (4L, 'u', 'r', None, None, 't', 2L, 'x'), (2L, 'g', None, 'h', None, 'j', 1L, None), (3L, 'i', None, 'h', 'u', 'u', 2L, None)) 
>>> data = dict() 
>>> for row in rows: 
... key, isDupe = row[-2], row[-1] 
... if key not in data: 
... data[key] = list(row[:-1]) 
... else: 
... for i in range(len(row)-1): 
... if data[key][i] is None or (not isDupe and row[i] is not None): 
...  data[key][i] = row[i] 
... 
>>> data 
{1L: [2L, 'g', 'c', 'h', 'k', 'j', 1L], 2L: [3L, 'i', 'r', 'h', 'u', 'u', 2L]} 
+0

感謝您的解決方案。我在實際的表中有幾百行。如何使你的代碼適應我的實際表格?再次感謝! – ThinkCode 2010-04-09 20:45:25

+0

表中的數據是否適合您的RAM?如果是這樣,我認爲不需要適應。 – Messa 2010-04-09 20:53:14

+0

它的工作原理!非常感謝。 我想出了將最終數據轉儲到MySQL表中的最佳方法。某些字段爲無,日期格式爲date.datetime。簡單的方法轉儲到MySQL? – ThinkCode 2010-04-09 21:55:17

相關問題