1
我正在閱讀幾個大型(〜700mb)CSV文件以轉換爲數據幀,這些數據幀將全部組合爲一個CSV。現在每個CSV都是按每個CSV中的date
列索引的。所有的CSV都有重疊的日期,但有獨特的測試位置。每個CSV都由其測試位置命名(例如BER.csv和alt.csv用於BER和ALT測試站點)。我怎樣才能像這樣多索引?現在,我有:將列添加到多索引的pandas數據框中
def openFile(filesToProcess):
df1 = pd.DataFrame()
counter = 0
for input in filesToProcess:
base = os.path.splitext(basename(input))[0]
print "Working on %s" % base
with open(input, 'r') as input_file:
#row_count = sum(1 for row in input_file)
if counter == 0:
df1 = createDataFrame(input_file)
else:
df2 = createDataFrame(input_file)
df1 = pd.concat([df1,df2])
counter += 1
input_file.close()
df1.to_csv('large.csv')
def createDataFrame(input_file):
checkTime = time.clock()
#print "Start DataFrame -- #%d" % counter
df1 = pd.read_csv(input_file,
sep = ",",
nrows = 500,
index_col = ['Date'])
#print "End DataFrame -- #%d" % counter
#print "Ran for " + str(time.clock() - checkTime) + " Seconds"
return df1
因此,舉例來說,我想
date, testsite, data1, data2
1/1/1992 9:15:00, ber, 89, 200
1/1/1992 9:17:00, ber, 54, 103.3
1/1/1992 9:15:00, alt, 90, 109.23
1/1/1992 9:17:00, alt, 12, 110.1
其中date
和testsite
是多指數