如何在使用read_csv導入時檢查數據幀？

我想在Python中使用熊貓導入一個.csv文件。我正在使用pandas.read_csv來做到這一點。但是我需要檢查數據框中的每一行，並將兩個特定列的值放入數組中。由於我的數據框有近3百萬行（〜1GB）的行，在導入需要時間後迭代執行。我可以在導入文件時自己做這件事嗎？修改read_csv庫函數以適應此問題是一個好主意嗎？如何在使用read_csv導入時檢查數據幀？

df = pd.read_csv("file.csv") 
def get(): 
    for a in list_A: #This list is of size ~2300 
     for b in list_B: #This list is of size ~12000 
      if a row exists such that it has a,b: 
       //do something

由於列表的大小很大，此功能運行緩慢。另外，查詢這麼大的數據幀也會減慢執行速度。任何提高性能的建議/解決方案。

來源

2017-10-04 Keerthimanu Gattu

這些是什麼特定的列？ –

@cᴏʟᴅsᴘᴇᴇᴅ在dataframe中，我想看看col1和col3 –

那麼爲什麼你不能使用'df [[col1'，'col3']]'？ –

Python的默認csv module逐行讀取文件，而不是將其全部加載到內存中。

代碼會是這個樣子：

import csv 
with open('file.csv') as csvfile: 
    csvreader = csv.reader(csvfile) 
    for row in csvreader: 
    if row[1] in list_A and row[3] in list_B: 
     # do something with the row

來源

2017-10-05 14:27:01 spin

如何在使用read_csv導入時檢查數據幀？

回答

相關問題