提供幾個文件的空pandas.dataframe

我想喂一個空的dataframe附加幾個相同類型和結構的文件。但是，我看不到這裏有什麼問題：提供幾個文件的空pandas.dataframe

def files2df(colnames, ext): 
    df = DataFrame(columns = colnames) 
    for inf in sorted(glob.glob(ext)): 
     dfin = read_csv(inf, sep='\t', skiprows=1) 
     print(dfin.head(), '\n') 
     df.append(dfin, ignore_index=True) 
    return df

生成的數據幀爲空。有人能幫我一把嗎？

1.0 16.59 0.597 0.87 1.0.1 3282 100.08 
0 0.953 14.52 0.561 0.80 0.99 4355  - 
1 1.000 31.59 1.000 0.94 1.00 6322  - 
2 1.000 6.09 0.237 0.71 1.00 10568  - 
3 1.000 31.29 1.000 0.94 1.00 14363  - 
4 1.000 31.59 1.000 0.94 1.00 19797  - 

     1.0 6.69 0.199 0.74 1.0.1 186 13.16 
0  1 0.88 0.020 0.13 0.99 394  - 
1  1 0.75 0.017 0.11 0.99 1052  - 
2  1 3.34 0.097 0.57 1.00 1178  - 
3  1 1.50 0.035 0.26 1.00 1211  - 
4  1 20.59 0.940 0.88 1.00 1583  - 

     1.0 0.12 0.0030 0.04 0.97 2285 2.62 
0  1 1.25 0.135 0.18 0.99 2480 - 
1  1 0.03 0.001 0.04 0.97 7440 - 
2  1 0.12 0.003 0.04 0.97 8199 - 
3  1 1.10 0.092 0.16 0.99 11174 - 
4  1 0.27 0.007 0.06 0.98 11310 - 

    0.244 0.07 0.0030 0.02 0.76 41314 1.32 
0 0.181 0.64 0.028 0.03 0.36 41755 - 
1 0.161 0.18 0.008 0.01 0.45 42420 - 
2 0.161 0.18 0.008 0.01 0.45 42461 - 
3 0.237 0.25 0.011 0.02 0.56 43060 - 
4 0.267 1.03 0.047 0.07 0.46 43321 - 

0.163 0.12 0.0060 0.01 0.5 103384 1.27 
0 0.243 0.27 0.014 0.02 0.56 104693 - 
1 0.215 0.66 0.029 0.04 0.41 105192 - 
2 0.190 0.10 0.005 0.01 0.59 105758 - 
3 0.161 0.12 0.006 0.01 0.50 109783 - 
4 0.144 0.16 0.007 0.01 0.42 110067 - 

Empty DataFrame 
Columns: array([D, LOD, r2, CIlow, CIhi, Dist, T-int], dtype=object) 
Index: array([], dtype=object)

來源

2012-10-03 fred

df.append（dfin，ignore_index = True）返回一個新的DataFrame，它不會在適當的位置更改df。使用df = df.append（dfin，ignore_index = True）。但即使有這種變化，我認爲這不會給你所需要的。 Append在axis = 1（columns）上擴展了一個框架，但我相信你想要在axis = 0（rows）上組合數據

在這種情況下（讀取多個文件並使用所有數據創建單個DataFrame），我會使用pandas.concat（）。下面的代碼將爲您提供一個由colnames命名的列的框架，這些行由csv文件中的數據組成。

def files2df(colnames, ext): 
    files = sorted(glob.glob(ext)) 
    frames = [read_csv(inf, sep='\t', skiprows=1, names=colnames) for inf in files] 
    return concat(frames, ignore_index=True)

我沒有嘗試這個代碼，只是寫在這裏，也許你需要調整它，以讓它運行，但這個想法是明確的（我希望）。

來源

2012-10-03 18:52:50

非常感謝您的幫助。我想要在axis = 0上擴展列。 – fred

此外，我找到了另一種解決方案，但不知道哪一個更快。

def files2df(colnames, ext): 
    dflist = [ ] 
    for inf in sorted(glob.glob(ext)): 
     dflist.append(read_csv(inf, names = colnames, sep='\t', skiprows=1)) 
     #print(dflist)                                
    df = concat(dflist, axis = 0, ignore_index=True) 
    #print(df.to_string())                               
    return df

來源

2012-10-03 19:34:40 fred

這是從熊貓的角度來看，與我所做的一樣。不同之處在於我使用list comprehension（http://docs.python.org/tutorial/datastructures.html#list-comprehensions）來創建框架列表。 –

@Wouter Overmeire：非常感謝您的鏈接。還有一個問題：假設我有沒有標題的文件（就像我上面的第一篇文章），是否有任何方法可以讓大熊貓自動索引列？換句話說，爲什麼我總是必須在read_csv時設置列名？ – fred

您可以跳過names = colnames並設置header = None，這樣就可以使用默認的列名：read_csv（inf，sep ='\ t'，skiprows = 1，header = None）（主設備上有我） –

提供幾個文件的空pandas.dataframe

回答

相關問題