我在3.0GB CSV文件中有2.92M個數據點,我需要循環兩次以創建一個我想要加載到NetworkX的圖形。按照目前的速度,我需要花幾天的時間來生成這個圖表。我如何加快速度?加速從2.92M個數據點創建圖形
similarity = 8
graph = {}
topic_pages = {}
CSV.foreach("topic_page_node_and_edge.csv") do |row|
topic_pages[row[0]] = row[1..-1]
end
CSV.open("generate_graph.csv", "wb") do |csv|
i = 0
topic_pages.each do |row|
i+=1
row = row.flatten
topic_pages_attributes = row[1..-1]
graph[row[0]] = []
topic_pages.to_a[i..-1].each do |row2|
row2 = row2.flatten
topic_pages_attributes2 = row2[1..-1]
num_matching_attributes = (topic_pages_attributes2 & topic_pages_attributes).count
if num_matching_attributes >= similarity or num_matching_attributes == topic_pages_attributes2.count or num_matching_attributes == topic_pages_attributes.count
graph[row[0]].push(row2[0])
end
end
csv << [row[0], graph[row[0]]].flatten
end
end
向我們展示您的代碼? – 2014-09-12 23:23:15
@theTinMan添加了代碼。謝謝。 – 2014-09-13 00:00:59
您在該機器上有多少RAM?您試圖在內存中保存2.92M個數據點,並且每個點*不是*佔用一個字節。 – 2014-09-13 00:09:43