A significant fraction of the rechunked_data/ Zarr stores on OSN contain correct metadata
(coordinates, dimensions, attributes) but NaN-filled data arrays. This primarily affects
temperature variables (tasmax, tasmin) across multiple methods.
import numpy as np
import xarray as xr
DATASETS = [
# Temperature -- return NaN
"https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/CMIP.CCCma.Ca
nESM5.historical.r1i1p1f1.day.GARD-SV.tasmax.zarr",
"https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/CMIP.CCCma.Ca
nESM5.historical.r1i1p1f1.day.GARD-SV.tasmin.zarr",
"https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/DeepSD/CMIP.CCCma.Can
ESM5.historical.r1i1p1f1.day.DeepSD.tasmax.zarr",
# Precipitation -- return valid data
"https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/CMIP.CCCma.Ca
nESM5.historical.r1i1p1f1.day.GARD-SV.pr.zarr",
]
for url in DATASETS:
ds = xr.open_zarr(url, chunks={})
var = list(ds.data_vars)[0]
val = float(ds[var].isel(time=0, lat=len(ds.lat)//2, lon=len(ds.lon)//2).values)
status = "NaN" if np.isnan(val) else f"{val:.2f}"
print(f"{var:>6} {status:>8} {dict(ds.sizes)} {url.split('rechunked_data/')[1]}")
ds.close()
Output
tasmax NaN {'time': 23741, 'lat': 721, 'lon': 1440}
GARD-SV/CMIP.CCCma.CanESM5.historical...tasmax.zarr
tasmin NaN {'time': 23741, 'lat': 721, 'lon': 1440}
GARD-SV/CMIP.CCCma.CanESM5.historical...tasmin.zarr
tasmax NaN {'time': 23741, 'lat': 720, 'lon': 1440}
DeepSD/CMIP.CCCma.CanESM5.historical...tasmax.zarr
pr 55.12 {'time': 23741, 'lat': 721, 'lon': 1440}
GARD-SV/CMIP.CCCma.CanESM5.historical...pr.zarr
Scope
We audited all 7286 Zarr files across the 5 daily methods (GARD-SV, GARD-MV, DeepSD, DeepSD-BC,
MACA). Approximate valid rates:
┌──────────┬───────┐
│ Variable │ Valid │
├──────────┼───────┤
│ pr │ ~82% │
├──────────┼───────┤
│ tasmax │ ~25% │
├──────────┼───────┤
│ tasmin │ ~22% │
└──────────┴───────┘
The pattern is not strictly by variable, e.g. DeepSD/CanESM5/ssp245/tasmin has valid data while
DeepSD/CanESM5/historical/tasmax does not.
Is there another way to download the datasets or are they stored elsewhere?
A significant fraction of the rechunked_data/ Zarr stores on OSN contain correct metadata
(coordinates, dimensions, attributes) but NaN-filled data arrays. This primarily affects
temperature variables (tasmax, tasmin) across multiple methods.
Output
tasmax NaN {'time': 23741, 'lat': 721, 'lon': 1440}
GARD-SV/CMIP.CCCma.CanESM5.historical...tasmax.zarr
tasmin NaN {'time': 23741, 'lat': 721, 'lon': 1440}
GARD-SV/CMIP.CCCma.CanESM5.historical...tasmin.zarr
tasmax NaN {'time': 23741, 'lat': 720, 'lon': 1440}
DeepSD/CMIP.CCCma.CanESM5.historical...tasmax.zarr
pr 55.12 {'time': 23741, 'lat': 721, 'lon': 1440}
GARD-SV/CMIP.CCCma.CanESM5.historical...pr.zarr
Scope
We audited all 7286 Zarr files across the 5 daily methods (GARD-SV, GARD-MV, DeepSD, DeepSD-BC,
MACA). Approximate valid rates:
┌──────────┬───────┐
│ Variable │ Valid │
├──────────┼───────┤
│ pr │ ~82% │
├──────────┼───────┤
│ tasmax │ ~25% │
├──────────┼───────┤
│ tasmin │ ~22% │
└──────────┴───────┘
The pattern is not strictly by variable, e.g. DeepSD/CanESM5/ssp245/tasmin has valid data while
DeepSD/CanESM5/historical/tasmax does not.
Is there another way to download the datasets or are they stored elsewhere?