2016-03-14 157 views
0

我有一個SQLite3數據庫,其中每行包含一個大小值和一個SHA256值。SQLAlchemy Query with Inner Join with Duplicate Column Values Criteria/Filter

以下SQL查詢通過返回所有在Size和SHA256_1024列條目中都有重複項的行來查找重複文件。

SELECT A.* 
FROM Files A 
INNER JOIN (SELECT Size, SHA256_1024 
    FROM Files 
    GROUP BY Size, SHA256_1024 
    HAVING COUNT(*) > 1)p B 
ON A.Size = B.Size AND A.SHA256_1024 = B.SHA256_1024 

而下面的計數的重複文件數:

SELECT COUNT(*) FROM 
(
SELECT A.* 
FROM Files A 
INNER JOIN (SELECT Size, SHA256_1024 
    FROM Files 
    GROUP BY Size, SHA256_1024 
    HAVING COUNT(*) > 1) B 
ON A.Size = B.Size AND A.SHA256_1024 = B.SHA256_1024) x 

我發現很難實現內部聯接和重複在SQLAlchemy的檢測功能,並且我對查詢的文檔( )。在文檔中有很多用於過濾值的示例,但是我沒有找到任何顯示如何在SQL表單中對列值Size和SHA256_1024進行重複值進行比較的示例。

class FO(Base): 
    __tablename__ = 'Files' 
    Id = Column(Integer, primary_key=True) 
    File = Column(String()) 
    Size = Column(Integer) 
    MD5_1024 = Column(String()) 

engine = create_engine('sqlite:///FileRel.db') 
Base.metadata.bind = engine 
DBSession = sessionmaker(bind=engine) 
session = DBSession() 

session.query(FO) #??? Lots more needed: .join .having, etc. 

回答

0

您可以創建子查詢

from sqlalchemy import func, and_ 

B = session.query(FO.Size, FO.MD5_1024).group_by(FO.Size, FO.MD5_1024).having(func.count() > 1).subquery() 

然後加入吧

query = session.query(FO).join(B, and_(FO.Size == B.c.Size, FO.MD5_1024 == B.c.MD5_1024)) 

,並得到數

query.count() 
+0

這美麗的工作,謝謝! – DJH