Subsetting NetCDF文件並返回tuple

我有一個大的（5GB）溫度netCDF文件。該文件有4個維度：時間，壓力水平，緯度，經度。Subsetting NetCDF文件並返回tuple

該數據集有31個時間點，我只對5個壓力級別感興趣。

我的參數是溫度t：

from netCDF4._netCDF4 import Dataset 
# Load the dataset 
dataset = Dataset(path) 
factor = dataset.variables['t']

從我身邊中心單元factor變量提取溫度數據的「立方體」，我只想做子集，這樣的：

radius = 5 
# +1 because the subsetting does not include last index 
lats_bounds = [nearest_latitude_index-radius,nearest_latitude_index+radius + 1] 
lons_bounds = [nearest_longitude_index-radius,nearest_longitude_index+radius +1] 

#all timepoints 
times_bounds = [0, len(times)] 

#just the last 5 pressure levels 
pressure_level_bounds = [len(levels)-5, len(levels)] 

results = factor[times_bounds[0]:times_bounds[1],pressure_level_bounds[0]:pressure_level_bounds[1], lats_bounds[0]:lats_bounds[1],lons_bounds[0]:lons_bounds[1]]

問題是現在的results現在的類型爲ndarray，形狀爲(31,5,11,11)，尺寸爲18755（31 * 5 * 11 * 11），其中每個索引只保留一個值。

我需要從results的值，但對於每個值，我還需要其相應的時間點，壓力水平，緯度和經度。

理想情況下，我想要做的子集，因爲我已經做了，但我的最終結果將是元組的數組......事情是這樣的：

# corresponding timestamp, pressure level, latitude, longitude 
# and the temperature value extracted. 
final = [ 
(2342342, 1000, 24.532, 53.531, 277), 
(2342342, 1000, 74.453, 26.123, 351), 
(2342342, 1000, 80.311, 56,345, 131), 
... 
]

我怎樣才能做到這一點？

來源

2017-09-01 pookie

-2

我想用Pandas來完成這個任務。但由於你只有35次和5次壓力水平，我首先簡化你的方法，並找出如何做一個單一的時間和壓力水平和一個經緯度。然後弄清楚如何循環這些索引來獲取你的元組。例如：

for i in range(0, len(times)): 
    for j in range(0, len(levels): 
    print(results[i, j, nearest_lat_idx, nearest_lon_idx))

當然，您也可以爲lat和lon添加循環，但它有點難看。

來源

2017-09-03 00:06:01

查看xarray的isel。漢譯netCDF4語法將是這個樣子：

ds = xr.open_dataset(path) 
factor = ds['t'] 

# note that levels/lon/lat are the names of dimensions in your Dataset 
subset = factor.isel(levels=slice(-5, None), 
        lon=[1, 18, 48, 99], lat=[16, 28, 33, 35]) 
stacked = subset.stack(points=('time', 'levels', 'lon', 'lat')) 

# This subset can be converted to a `pandas.Series`: 
data = stacked.to_pandas() 

# or it can be converted to a list of tuples 
df = data.reset_index() 
final = [tuple(row[1].values) for row in df.iterrows()]

Xarray還支持基於標籤索引（即lat=[29.3, 42.3]），但對於這一點，你應該使用sel方法，而不是isel。

來源

2017-09-05 03:54:36 jhamman

謝謝！我注意到你沒有在'factor.isel（）'中包含'time'。那是故意的嗎？我需要使用所有時間點，但將來我可能只需要一個子集。時間點採用unix時間戳格式。 – pookie

如果您要沿時間軸切片，但對於您當前的用例，則可以添加'time = slice（start，end）'，這將爲您提供所有時間步長。 – jhamman

Subsetting NetCDF文件並返回tuple

回答

相關問題