提高查詢性能

我需要從PostgreSQL數據庫讀取並加入很多行（〜500k）並將它們寫入MySQL數據庫。提高查詢性能

我幼稚的做法是這樣的

entrys = Entry.query.yield_per(500) 

    for entry in entrys: 
     for location in entry.locations: 
      mysql_location = MySQLLocation(entry.url) 
      mysql_location.id = location.id 
      mysql_location.entry_id = entry.id 

      [...] 

      mysql_location.city = location.city.name 
      mysql_location.county = location.county.name 
      mysql_location.state = location.state.name 
      mysql_location.country = location.country.name 

      db.session.add(mysql_location) 

    db.session.commit()

每個Entry具有約1〜100 Locations。

這個腳本現在運行了大約20個小時，並且已經消耗大於4GB的內存，因爲所有內容都保存在內存中，直到會話被提交。

隨着我早前提交的嘗試，我遇到了像this這樣的問題。

如何提高查詢性能？它需要快得多，因爲在接下來的幾個月裏，行數將增長到2500k左右。

來源

2013-08-02 dbanck

爲什麼不能使用[Extract，Transform，Load]（http://en.wikipedia.org/wiki/Extract,_transform,_load）方法？ – AndrewS

基本上'pg_dump dbname | mysql dbname' –

@JochenRitzel，我將多個表中的多行連接成一行。我沒有看到'pg_dump'如何提供幫助。 – dbanck

你的天真方法存在缺陷，原因是你已經知道 - 吃你的記憶的東西是模型對象在等待被刷新到mysql的內存中晃來晃去。

最簡單的方法是根本不使用ORM進行轉換操作。直接使用SQLAlchemy表對象，因爲它們也更快。

此外，您可以做的是創建2個會話，並將2個引擎綁定到單獨的會話中！然後你可以提交每個批次的mysql會話。

來源

2013-08-02 10:49:27

我支持2個單獨的會話，其中每個會使用[expunge_all（）]（http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#sqlalchemy.orm）清理它們。 session.Session.expunge_all）。另外，您（@dbanck）運行的問題也使用範圍查詢而不是yield_per來解決。 – van

提高查詢性能

回答

相關問題