Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for opening combined grid-and-data files via a single-argument ux.open_dataset(file) call, aligning with Issue #345’s request to handle datasets stored in one file.
Changes:
- Made
filename_or_objoptional inuxarray.core.api.open_dataset, defaulting to the grid file path when omitted. - Updated
open_datasetdocstring to document and exemplify the one-file usage. - Added a unit test covering the new one-argument
open_datasetbehavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
uxarray/core/api.py |
Makes filename_or_obj optional and adds logic/docs for single-file grid+data opening. |
test/core/test_api.py |
Adds a regression test for calling ux.open_dataset() with a single argument. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
erogluorhan
left a comment
There was a problem hiding this comment.
Thanks a lot for taking this one. Please find my comments below
uxarray/core/api.py
Outdated
| def open_dataset( | ||
| grid_filename_or_obj: str | os.PathLike[Any] | dict | Dataset, | ||
| filename_or_obj: str | os.PathLike[Any], | ||
| filename_or_obj: str | os.PathLike[Any] | None = None, |
There was a problem hiding this comment.
While technically there is nothing wrong with this, I'd like to share a few thoughts on use-case logic:
- Defaulting
filename_or_objtoNonemight not be the best idea since, in most cases, it is needed, i.e. single file dataset opening is anticipated to occur much less. - Also, more of a terminology thing, but my observation is that when there is only file that has the grid information in it as well, it is referred to as the data or diagnostics file rather than the grid file. That said, I'd recommend to handle the single file case when
grid_filename_or_objis givenNone, andfilename_or_objhas both grid definition and data in it.
There was a problem hiding this comment.
I went with ux.open_dataset(file) as the smallest backward-compatible public API change and avoids open_dataset(None, file) calling pattern. I agree the naming is a little awkward, but my intent was to add the one-file shorthand without changing the existing two-file semantics.
| if isinstance(grid_filename_or_obj, (str, os.PathLike)): | ||
| if os.path.isdir(grid_filename_or_obj): | ||
| raise ValueError( | ||
| "Directory-based grids require a separate data file when calling ux.open_dataset()." |
There was a problem hiding this comment.
Is this to cover the multiple grid files case?
There was a problem hiding this comment.
Yes, I added this to explicit error to avoid a confusing downstream xarray failure
uxarray/core/api.py
Outdated
| ds = _open_dataset_with_fallback(filename_or_obj, chunks=corrected_chunks, **kwargs) | ||
| filename_or_obj = grid_filename_or_obj | ||
|
|
||
| if "latlon" in kwargs: |
There was a problem hiding this comment.
Good catch, I can fold that back into a cleaner shared path.
| grid_kwargs["latlon"] = kwargs["latlon"] | ||
|
|
||
| corrected_chunks = match_chunks_to_ugrid(grid_filename_or_obj, chunks) | ||
| ds = _open_dataset_with_fallback( |
There was a problem hiding this comment.
I am a bit confused about _get_grid vs. match_chunks_to_ugrid plus _open_dataset_with_fallback. Please clarify why this switch.
There was a problem hiding this comment.
this was to avoid opening the same file twice, I will refactor this part of the code to be cleaner.
| grid_filename_or_obj, chunks, chunk_grid, use_dual, grid_kwargs, **kwargs | ||
| ) | ||
| if filename_or_obj is None: | ||
| if isinstance(grid_filename_or_obj, (str, os.PathLike)): |
There was a problem hiding this comment.
Limiting what was originally possible (dict, xr.dataset) doesn't look like a great idea. I am not sure if we could support dict with this, but I am pretty sure we should for xr.dataset. I think the main check should be on whether a good Grid object could be opened out of the file object here or not.
There was a problem hiding this comment.
xr.Dataset should also work in the one-input case. I’ll update the implementation
| raise ValueError( | ||
| "If filename_or_obj is omitted, grid_filename_or_obj must be a file path." | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Maybe rather than splitting the whole logic into two parts as it is in this if-else block now, we'd only need to split the grid opening task as follows:
- Check if I have only one file path object given
- If so, try & catch opening a grid object from that one
- Else, try opening the grid from the separate grid file
- In both cases, open dataset from the same file
| nt.assert_equal(len(uxds_var2_ne30.uxgrid._ds.data_vars), mesh_constants['DATAVARS_outCSne30']) | ||
| nt.assert_equal(uxds_var2_ne30.source_datasets, str(data_path)) | ||
|
|
||
|
|
There was a problem hiding this comment.
We need another test case to make sure the single file has proper grid definitions in it (i.e. If not throw exception)
Closes #345 by allowing ux.open_dataset(file) for combined grid-and-data files.