連接多個列

我與下面的代碼矩陣，並存儲一定的數據在它連接多個列

df = [] 
r = 5000 
c = 50 
for i in xrange(r): 
    r = [''] * c 
    table.append(r)

這樣的矩陣如下所示：

0  1   2     3  4 5  6 7 ... 
3 NaN Nestlé  Africa   Import 
4 NaN Nutella Europe   Report 2010 to 2011 
5 Shell   USA    Revenues  2017

由於每一行都有列數不均勻，我很困惑如何將所有列連接爲一列，並最終刪除不必要的空列，以便它看起來像這樣

1 
3. Nestlé Africa Import 
4. Nutella Europe Report 2010 to 2011 
5. Shell USA Revenues 2017 
etc.

如果在pandas.DataFrame(e.g. df2 = pd.DataFrame(df))中做到這一點比較容易，那麼我也很好。

來源

2017-04-11 Probs

我不確定數據來自哪裏，爲什麼它會不均勻？使用''.join（）方法可以很容易地連接，只是讓我知道雀巢，非洲等數據來自何處以及爲什麼會不均勻 –

嗨Abid，數據來自ocr'd pdf文檔，給出這些結果的表格中長度不均勻。然而，這些結果是組成的，它只是代表我的問題 – Probs

爲什麼你不能使用數組的長度來確定刪除列的位置？ –

使用pandas，你可以加入像非空列：

代碼：

df['concat'] = df.apply(lambda x: ' '.join(
    [unicode(y) for y in x if not pd.isnull(y)]), axis=1)

測試代碼：

import pandas as pd 
from io import StringIO 
df = pd.read_fwf(StringIO(u""" 
    0  1   2     3  4 5  6 
3 NaN Nestlé  Africa   Import 
4 NaN Nutella Europe   Report 2010 to 2011 
5 Shell   USA    Revenues  2017"""), 
    skiprows=0, header=1, index_col=0) 
print(df) 

df['concat'] = df.apply(lambda x: ' '.join(
    [unicode(y) for y in x if y and not pd.isnull(y)]), axis=1) 

print(df['concat'])

結果：

 0  1  2   3  4  5  6 
3   Nestlé Africa Import     
4   Nutella Europe Report 2010 to 2011 
5 Shell    USA Revenues  2017  

3      Nestlé Africa Import 
4 Nutella Europe Report 2010.0 to 2011.0 
5     Shell USA Revenues 2017

來源

2017-04-12 01:58:32

回答

相關問題