如何提高比較兩個列表和範圍之間的值的python腳本的速度？

我有兩個大文件的數據集：如何提高比較兩個列表和範圍之間的值的python腳本的速度？

File1: 
Gen1 1 1 10 
Gen2 1 2 20 
Gen3 2 30 40 

File2: 
A 1 4 
B 1 15 
C 2 2

預期輸出：

Out: 
Gen1 1 1 10 A 1 4 
Gen2 1 2 20 B 1 15

現在我基本上只是試圖找到實例，其中文件2文件1，如果代碼的文件2 [ 1]匹配文件1 [1]和在文件中的範圍介於1

我的代碼，這是否是以下：

for i in file1: 

    temp = i.split() 

    for a in file2: 

     temp2 = a.split() 

     if temp[1] == temp2[1] and temp2[2] >= temp[2] and temp2[2] <= temp[3] 

      print(i + " " + a + "\n") 

     else: 

      continue

該代碼有效，但我覺得需要比預期更長的時間。有沒有更簡單的方法或方法來做到這一點？我覺得有一些巧妙的使用地圖或哈希，我不這樣做。

謝謝！

來源

2017-03-16 perot57

40 30似乎並不像一個有效的範圍是多少？ –

正確我應該解決這個問題！ – perot57

使用熊貓，這使用一個編譯的後端，將是一個班輪 – maxymoo

熊貓可能是一個不錯的選擇。請參閱this示例。

當文件很大時，我更喜歡sqlite而非熊貓。熊貓數據框可以從sqlite數據庫加載。

import sqlite3 

file1 = """Gen1 1 1 10 
Gen2 1 2 20 
Gen3 2 30 40""" 

file2 = """A 1 4 
B 1 15 
C 2 2""" 

# your code (fixed) 
print("desired output") 
for i in file1.splitlines(): 
    temp = i.split() 
    for a in file2.splitlines(): 
     temp2 = a.split() 
     if temp[1] == temp2[1] and int(temp2[2]) >= int(temp[2]) and int(temp2[2]) <= int(temp[3]): 
      print(i + " " + a) 


# Make an in-memory db 
# Set a filename if your files are too big or if you want to reuse this db 
con = sqlite3.connect(":memory:") 
c = con.cursor() 

c.execute("""CREATE TABLE file1 
(
    gene_name text, 
    a integer, 
    b1 integer, 
    b2 integer 
)""") 

for row in file1.splitlines(): 
    if row: 
     c.execute("INSERT INTO file1 (gene_name, a, b1, b2) VALUES (?,?,?,?)", tuple(row.split())) 

c.execute("""CREATE TABLE file2 
(
    name text, 
    a integer, 
    b integer 
)""") 

for row in file2.splitlines(): 
    if row: 
     c.execute("INSERT INTO file2 (name, a, b) VALUES (?,?,?)", tuple(row.split())) 

# join tow tables 
print("sqlite3 output") 
for row in c.execute("""SELECT 
    file1.gene_name, 
    file1.a, 
    file1.b1, 
    file1.b2, 
    file2.name, 
    file2.a, 
    file2.b 
FROM file1 
JOIN file2 ON file1.a = file2.a AND file2.b >= file1.b1 AND file2.b <= file1.b2 
"""): 
    print(row) 

con.close()

輸出：

desired output 
Gen1 1 1 10 A 1 4 
Gen2 1 2 20 A 1 4 
Gen2 1 2 20 B 1 15 
sqlite3 output 
(u'Gen1', 1, 1, 10, u'A', 1, 4) 
(u'Gen2', 1, 2, 20, u'A', 1, 4) 
(u'Gen2', 1, 2, 20, u'B', 1, 15)

來源

2017-03-17 02:30:19 klim

如何提高比較兩個列表和範圍之間的值的python腳本的速度？

回答

相關問題