將csv文件（從一個文件夾）合併爲一個，使用Python添加具有不同名稱的列

我需要將位於一個文件夾中的多個CSV文件合併爲一個文件。將csv文件（從一個文件夾）合併爲一個，使用Python添加具有不同名稱的列

我原來的數據是這樣的

y_1980.csv：

 country y_1980 
0  afg 196 
1  ago 125 
2  alb  23 
3   .  . 
.   .  .

y_1981.csv：

 country y_1981 
0  afg 192 
1  ago 120 
2  alb  0 
3   .  . 
.   .  .

y_20xx.csv：

 country y_20xx 
0  afg 176 
1  ago 170 
2  alb  76 
3   .  . 
.   .  .

我所期望得到的是類似這樣的：

 country y_1980 y_1981 ... y_20xx  
0  afg  196  192 ...  176 
1  ago  125  120 ...  170 
2  alb  23  0 ...  76 
3   .  .  . ...  . 
.   .  .  . ...  .

到目前爲止，我當前的代碼如下，但結果我得到的是數據幀的一前一後的合併：

interesting_files = glob.glob("/Users/Desktop/Data/*.csv") 

header_saved = True 

with open('/Users/Desktop/Data/table.csv','wb') as fout: 
    for filename in interesting_files: 

     with open(filename) as fin: 
      header = next(fin) 
      if not header_saved: 
       fout.write(header) 
       header_saved = True 
      for line in fin: 
       fout.write(line)

來源

2017-03-15 PAstudilloE

它，如果你使用容易得多'pandas'。因爲它擺脫了「for-loop」並且保持內存足跡低。而且，它更全面。讓我知道你是否想要熊貓解決方案。 – everestial007

是的，我想熊貓解決方案請 – PAstudilloE

檢查答案。它會工作優雅，更全面。讓我知道它是否有效。 – everestial007

代碼的順序似乎像運行：

打開文件＃1
，所有的數據連接成一個單一的文件撰寫標題如果沒有保存的數據
打開文件＃2
的
寫入線...等

。這聽起來像你實際上想要加入列「國家」而不是

import glob 
import pandas as pd 
csvs = glob.glob("*.csv") 
dfs = [] 

for csv in csvs: 
    dfs.append(pd.read_csv(csv)) 

merged_df = dfs[0] 

for df in dfs[1:]: 
    merged_df = pd.merge(merged_df,df,on=['country']) 


merged_df.to_csv('out.csv',index=False)

來源

2017-03-15 00:42:48 ayplam

我試圖運行此代碼，但我得到此錯誤： ---> 13 merged_df =爲df在dfs [1：]： IndexError：列表索引超出範圍 – PAstudilloE

修改我的代碼修復一些格式化，但是你可以檢查以確保dfs實際上包含讀入的數據框列表嗎？帶'dfs [1：]'的for循環循環遍歷除了第一個數據幀以外的所有數據幀，因爲它是在聲明merged_df時分配的 – ayplam

如果你使用熊貓，它會容易得多。原因是它將擺脫for-loop問題並保持memory footprint低。

import pandas as pd 

# read the files first 

y_1980 = pd.read_csv('y_1980.csv', sep='\t') 
y_1981 = pd.read_csv('y_1981.csv', sep='\t')

如果值可以用「」或「」爲逗號空格分隔您可以更改sep選項。

# set 'country' as the index to use this value to merge. 
y_1980 = y_1980.set_index('country', append=True) 
y_1981 = y_1981.set_index('country', append=True) 

print(y_1980) 
print(y_1981) 

      y_1980 
    country   
    0 afg   196 
    1 ago   125 
    2 alb   23 


      y_1980 
    country   
    0 afg   192 
    1 ago   120 
    2 alb   0 

# set the frames to merge. You can add as many dataframe as you want. 
frames =[y_1980, y_1981] 

# now merge the dataframe 
merged_df = pd.concat(frames, axis=1).reset_index(level=['country']) 
print(result) 

     country y_1980 y_1980 
0  afg  196  192 
1  ago  125  120 
2  alb  23  0

附加說明：how='inner' and drop=na：如果要合併只是存在於所有幀的按鍵，你可以添加選項。如果要合併所有幀中的所有可能數據，請使用how='outer'。

請參閱此鏈接更多詳細信息：http://pandas.pydata.org/pandas-docs/stable/merging.html

來源

2017-03-15 01:03:53 everestial007

大熊貓使這很容易。有了環和合並，你可以簡單地做：

代碼：

import pandas as pd 

files = ['file1', 'file2'] 
dfs = None 
for filename in files: 
    df = pd.read_csv(filename, sep='\s+') 
    if dfs is None: 
     dfs = df 
    else: 
     dfs = dfs.merge(df, how='outer') 
    print(df) 
print(dfs) 
dfs.to_csv('file3', sep=' ')

結果：

country y_1980 
0  afg  196 
1  ago  125 
2  alb  23 

    country y_1981 
0  afg  192 
1  ago  120 
2  alb  0 

    country y_1980 y_1981 
0  afg  196  192 
1  ago  125  120 
2  alb  23  0

來源

2017-03-15 01:05:59

將csv文件（從一個文件夾）合併爲一個，使用Python添加具有不同名稱的列

回答

相關問題