更有效的方法來做這個搜索算法？

我只是想知道是否有更好的方法來做這個算法。我發現我需要經常進行這種類型的操作，而且我現在這樣做的方式需要幾個小時，因爲我相信它會被認爲是n^2算法。我會在下面附上。更有效的方法來做這個搜索算法？

import csv 

with open("location1", 'r') as main: 
    csvMain = csv.reader(main) 
    mainList = list(csvMain) 

with open("location2", 'r') as anno: 
    csvAnno = csv.reader(anno) 
    annoList = list(csvAnno) 

tempList = [] 
output = [] 

for full in mainList: 
    geneName = full[2].lower() 
    for annot in annoList: 
     if geneName == annot[2].lower(): 
      tempList.extend(full) 
      tempList.append(annot[3]) 
      tempList.append(annot[4]) 
      tempList.append(annot[5]) 
      tempList.append(annot[6]) 
      output.append(tempList) 

     for i in tempList: 
      del i 

with open("location3", 'w') as final: 
    a = csv.writer(final, delimiter=',') 
    a.writerows(output)

我有一個包含每個15000要把兩個CSV文件，我期待從每列比較，如果它們匹配，拼接第二CSV年底到第一年底。任何幫助將不勝感激！

謝謝！

來源

2017-02-15 Alex Kirst

Pro：適用於本地庫並且沒有外部依賴關係。 Con：大熊貓可以做得更容易，速度更快（如下所述）。無論是比較還是追加（我認爲這將是3或4行代碼） – Kelvin

它應該是更有效的是這樣的：

import csv 
from collections import defaultdict 

with open("location1", 'r') as main: 
    csvMain = csv.reader(main) 
    mainList = list(csvMain) 

with open("location2", 'r') as anno: 
    csvAnno = csv.reader(anno) 
    annoList = list(csvAnno) 

output = [] 
annoMap = defaultdict(list) 

for annot in annoList: 
    tempList = annot[3:] # adapt this to the needed columns 
    annoMap[annot[2].lower()].append(tempList) # put these columns into the map at position of the column of intereset 

for full in mainList: 
    geneName = full[2].lower() 
    if geneName in annoMap: # check if matching column exists 
    output.extend(annoMap[geneName]) 

with open("location3", 'w') as final: 
    a = csv.writer(final, delimiter=',') 
    a.writerows(output)

這是更有效，因爲你需要遍歷每個列表只有一次。字典中的查找平均爲O（1），因此您基本上可以獲得線性算法。

來源

2017-02-15 17:56:14 limes

如果您解釋*爲什麼*您的更改使其更有效，這可能會有所幫助。 – Paul

美麗！這幾乎是完美的，儘管它只是將映射值打印到文件中。將* full *變量和* annoMap [geneName] *一起作爲一個長字符串是一個簡單的修復。非常感謝！ –

一個簡單的方法是使用像Pandas這樣的庫。內置的功能非常高效。

您可以使用pandas.read_csv()將您的csv加載到數據框中，然後使用pandas函數對其進行處理。

例如，您可以使用Pandas.merge()將兩個數據框（又名您的兩個csv文件）合併到特定的列，然後刪除不需要的那個。

如果您有一些數據庫知識，這裏的邏輯非常相似。

來源

2017-02-15 17:57:25 QVaucher

謝謝@limes的幫助。這是我用過的最後一個腳本，以爲我會發布它來幫助其他人。再次感謝！

import csv 
from collections import defaultdict 

with open("location1", 'r') as main: 
    csvMain = csv.reader(main) 
    mainList = list(csvMain) 

with open("location2", 'r') as anno: 
    csvAnno = csv.reader(anno) 
    annoList = list(csvAnno) 

output = [] 
annoMap = defaultdict(list) 

for annot in annoList: 
    tempList = annot[3:] # adapt this to the needed columns 
    annoMap[annot[2].lower()].append(tempList) # put these columns into the map at position of the column of intereset 

for full in mainList: 
    geneName = full[2].lower() 
    if geneName in annoMap: # check if matching column exists 
    list = annoMap[geneName] 
    full.extend(list[0]) 
    output.append(full) 

with open("location3", 'w') as final: 
a = csv.writer(final, delimiter=',') 
a.writerows(output)

來源

2017-02-15 19:05:00

更有效的方法來做這個搜索算法？

回答

相關問題