2017-03-28 77 views
3

假設一個數據幀我有一組DASK陣列如:DASK,創建幾個DASK陣列

c1 = da.from_array(np.arange(100000, 190000), chunks=1000) 
c2 = da.from_array(np.arange(200000, 290000), chunks=1000) 
c3 = da.from_array(np.arange(300000, 390000), chunks=1000) 

是有可能創造從他們DASK數據幀?在熊貓我可以說:

data = {} 
data['c1'] = c1 
data['c2'] = c2 
data['c3'] = c3 

df = pd.DataFrame(data) 

有沒有類似的方式來做到這一點與dask?

+1

我懷疑你可以用'dd.from_dask_array'和'dd.concat(...,軸= 1)'的組合做到這一點。 – MRocklin

回答

3

下面應該工作:

import pandas as pd, numpy as np 
import dask.array as da, dask.dataframe as dd 

c1 = da.from_array(np.arange(100000, 190000), chunks=1000) 
c2 = da.from_array(np.arange(200000, 290000), chunks=1000) 
c3 = da.from_array(np.arange(300000, 390000), chunks=1000) 

# generate dask dataframe 
ddf = dd.concat([dd.from_dask_array(c) for c in [c1,c2,c3]], axis = 1) 
# name columns 
ddf.columns = ['c1', 'c2', 'c3']