我有許多獨立的X，Y（整數）列數據的儀器文件。所有數組都是相同的維度。 X列對於每個文件都是相同的，Y列號是不同的。如果可能，我想將連續文件的Y列連接到第一個文件，然後寫一個新的包含第一個X和多個Y的單個大數組？就像這樣：使用pandas（和glob？）合併目錄中的大量（csv）數據文本文件。

文件1 = X1 Y1 = file2的X1 Y2 =文件3 X1 Y3 ......新文件的結果應該是：X1 Y1 Y2 Y3 ......

一直在尋找的變化：進口大熊貓數據= pandas.read_csv（「file1.csv」）

打印（數據）返回一號文件陣列確定

需要打開和環比連續文件加入Ÿ列到file1 ......

來源

2016-05-06 numpystack

你可以做這樣的事情：

import os 
import glob 
import pandas as pd 

def get_merged_csv(flist, **kwargs): 
    return pd.concat([pd.read_csv(f, **kwargs).set_index('X') for f in flist], axis=1).reset_index() 

path = 'C:/Users/csvfiles' 
fmask = os.path.join(path, '*mask*.csv') 

df = get_merged_csv(glob.glob(fmask))

以命名Y列像Y1，Y2等：

cols = ['{0[0]}{0[1]}'.format(t) for t in zip(df.columns[1:], range(1, len(df.columns)))] 
df.columns = df.columns.tolist()[:1] + cols

測試數據：

a.csv：

X,Y 
1,11 
2,12 
3,13

b.csv ：

X,Y 
1,21 
2,22 
3,23

c.csv：

X,Y 
1,31 
2,32 
3,33

測試：

In [215]: df = get_merged_csv(glob.glob(fmask)) 

In [216]: df 
Out[216]: 
    X Y Y Y 
0 1 11 21 31 
1 2 12 22 32 
2 3 13 23 33 

In [217]: cols = ['{0[0]}{0[1]}'.format(t) for t in zip(df.columns[1:], range(1, len(df.columns)))] 

In [218]: cols 
Out[218]: ['Y1', 'Y2', 'Y3'] 

In [219]: df.columns = df.columns.tolist()[:1] + cols 

In [220]: df 
Out[220]: 
    X Y1 Y2 Y3 
0 1 11 21 31 
1 2 12 22 32 
2 3 13 23 33

來源

2016-05-06 19:38:17 MaxU

嘿MaxU，這個偉大的工程！我用它在幾秒鐘內構建了一個巨大的數據框。由於我對Python相當陌生，你有建議寫出「df？」的結果嗎？我也可以將df傳遞給MatPlotLib進行繪圖嗎？ – numpystack

@numpystack，關於寫出結果 - 如果你的意思是性能，你可能想閱讀[這個答案]（http://stackoverflow.com/questions/37010212/what-is-the-fastest-way-to-upload-a - 大 - CSV文件 - 在筆記本到工作用的Python/37012035＃37012035）。關於Matplotlib - 你必須指定你想要繪製什麼以及如何繪製 – MaxU

@numpystack，謝謝接受答案！ – MaxU

使用pandas（和glob？）合併目錄中的大量（csv）數據文本文件。

打印（數據）返回一號文件陣列確定

回答

相關問題