2013-06-04 82 views
0

我有一個名爲passive的表,而不是每個用戶包含時間戳事件的列表。我想填充屬性duration,這對應於當前行事件與此用戶完成的下一個事件之間的時間。如何在嵌套子查詢中引用主查詢的表?

我嘗試以下查詢:

UPDATE passive as passive1 
SET passive1.duration = (
    SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time)) 
    FROM passive as passive2 
    WHERE passive1.user_id = passive2.user_id 
    AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0 
); 

這將返回錯誤信息Error 1093 - You can't specify target table for update in FROM。爲了規避這個限制,我試圖遵循https://stackoverflow.com/a/45498/395857中給出的結構,它使用FROM子句中的嵌套子查詢來創建一個隱式臨時表,這樣它就不會被視爲同一個表,再次更新:

UPDATE passive 
SET passive.duration = (

    SELECT * 
    FROM (SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive.event_time)) 
     FROM passive, passive as passive2 
     WHERE passive.user_id = passive2.user_id 
     AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0 
     ) 
    AS X 
); 

然而,passive表中嵌套子查詢不指同一passive作爲主查詢。因此,所有行都具有相同的passive.duration值。如何在嵌套子查詢中引用主查詢的passive? (或者是有一些替代的方法來構建這樣的查詢?)

回答

2

嘗試像這樣....

UPDATE passive as passive1 
SET passive1.duration = (
    SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time)) 
    FROM (SELECT * from passive) Passive2 
    WHERE passive1.user_id = passive2.user_id 
    AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0 
    ) 
; 
+0

非常感謝,它確實繞過了錯誤1093.但是,不('從被動方式選擇*)'減慢查詢? (被動表是6 GB)你有任何建議來加快查詢? –

+0

@FranckDernoncourt從查詢它看起來像你有多個條目相同的用戶ID,所以你可以嘗試'選擇最大(event_time),user_id從被動組user_id' ...我認爲這將減少表的大小一點點。 ... –

+0

Thanks,'Max(event_time)'會改變結果,但是'SELECT event_time,user_id from passive] Passive2'會降低運行時間(約10%)。我有種感覺這種操作不適合SQL。 –

0

我們可以使用Python腳本來規避這個問題:

''' 
We need an index on user_id, timestamp to speed up 
''' 

#!/usr/bin/python 
# -*- coding: utf-8 -*- 

# Download it at http://sourceforge.net/projects/mysql-python/?source=dlp 
# Tutorials: http://mysql-python.sourceforge.net/MySQLdb.html 
#   http://zetcode.com/db/mysqlpython/ 
import MySQLdb as mdb 

import datetime, random 

def main(): 
    start = datetime.datetime.now() 

    db=MySQLdb.connect(user="root",passwd="password",db="db_name") 
    db2=MySQLdb.connect(user="root",passwd="password",db="db_name") 

    cursor = db.cursor() 
    cursor2 = db2.cursor() 

    cursor.execute("SELECT observed_event_id, user_id, observed_event_timestamp FROM observed_events ORDER BY observed_event_timestamp ASC") 

    count = 0 
    for row in cursor: 
     count += 1 
     timestamp = row[2] 
     user_id = row[1] 
     primary_key = row[0] 
     sql = 'SELECT observed_event_timestamp FROM observed_events WHERE observed_event_timestamp > "%s" AND user_id = "%s" ORDER BY observed_event_timestamp ASC LIMIT 1' % (timestamp, user_id) 
     cursor2.execute(sql) 
     duration = 0 
     for row2 in cursor2: 
      duration = (row2[0] - timestamp).total_seconds() 
      if (duration > (60*60)): 
       duration = 0 
       break 

     cursor2.execute("UPDATE observed_events SET observed_event_duration=%s WHERE observed_event_id = %s" % (duration, primary_key)) 

     if count % 1000 == 0: 
      db2.commit() 
      print "Percent done: " + str(float(count)/cursor.rowcount * 100) + "%" + " in " + str((datetime.datetime.now() - start).total_seconds()) + " seconds." 

    db.close() 
    db2.close() 
    diff = (datetime.datetime.now() - start).total_seconds() 
    print 'finished in %s seconds' % diff 

if __name__ == "__main__": 
    main()