Many rechunked Zarr stores contain metadata but no data (NaN-filled arrays)

                                                  
                                                                                                    
  A significant fraction of the rechunked_data/ Zarr stores on OSN contain correct metadata         
  (coordinates, dimensions, attributes) but NaN-filled data arrays. This primarily affects          
  temperature variables (tasmax, tasmin) across multiple methods.                                   
                                                            

```

  import numpy as np
  import xarray as xr                                                                               
   
  DATASETS = [                                                                                      
      # Temperature -- return NaN                           
      "https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/CMIP.CCCma.Ca
  nESM5.historical.r1i1p1f1.day.GARD-SV.tasmax.zarr",                                               
      "https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/CMIP.CCCma.Ca
  nESM5.historical.r1i1p1f1.day.GARD-SV.tasmin.zarr",                                               
      "https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/DeepSD/CMIP.CCCma.Can
  ESM5.historical.r1i1p1f1.day.DeepSD.tasmax.zarr",                                                 
      # Precipitation -- return valid data                  
      "https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/GARD-SV/CMIP.CCCma.Ca
  nESM5.historical.r1i1p1f1.day.GARD-SV.pr.zarr",                                                   
  ]                                                                                                 
                                                                                                    
  for url in DATASETS:                                      
      ds = xr.open_zarr(url, chunks={})
      var = list(ds.data_vars)[0]                                                                   
      val = float(ds[var].isel(time=0, lat=len(ds.lat)//2, lon=len(ds.lon)//2).values)              
      status = "NaN" if np.isnan(val) else f"{val:.2f}"                                             
      print(f"{var:>6}  {status:>8}  {dict(ds.sizes)}  {url.split('rechunked_data/')[1]}")          
      ds.close()                                                                                    
                                       
```                                                             
  Output                                                                                            
                                                                                                    
  tasmax       NaN  {'time': 23741, 'lat': 721, 'lon': 1440}                                        
  GARD-SV/CMIP.CCCma.CanESM5.historical...tasmax.zarr                                               
  tasmin       NaN  {'time': 23741, 'lat': 721, 'lon': 1440}
  GARD-SV/CMIP.CCCma.CanESM5.historical...tasmin.zarr                                               
  tasmax       NaN  {'time': 23741, 'lat': 720, 'lon': 1440}
  DeepSD/CMIP.CCCma.CanESM5.historical...tasmax.zarr                                                
      pr     55.12  {'time': 23741, 'lat': 721, 'lon': 1440}
  GARD-SV/CMIP.CCCma.CanESM5.historical...pr.zarr                                                   
                                                            
  Scope                                                                                             
                                                            
  We audited all 7286 Zarr files across the 5 daily methods (GARD-SV, GARD-MV, DeepSD, DeepSD-BC,   
  MACA). Approximate valid rates:                           
                                                                                                    
  ┌──────────┬───────┐                                                                              
  │ Variable │ Valid │
  ├──────────┼───────┤                                                                              
  │ pr       │ ~82%  │                                      
  ├──────────┼───────┤
  │ tasmax   │ ~25%  │                                                                              
  ├──────────┼───────┤                                                                              
  │ tasmin   │ ~22%  │                                                                              
  └──────────┴───────┘                                                                              
                                                            
  The pattern is not strictly by variable, e.g. DeepSD/CanESM5/ssp245/tasmin has valid data while 
  DeepSD/CanESM5/historical/tasmax does not. 

Is there another way to download the datasets or are they stored elsewhere?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Many rechunked Zarr stores contain metadata but no data (NaN-filled arrays) #339

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Many rechunked Zarr stores contain metadata but no data (NaN-filled arrays) #339

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions