Skip to content

style: make_units, launch_runs#29

Open
7PintsOfCherryGarcia wants to merge 6 commits into
GeoGenetics:mainfrom
7PintsOfCherryGarcia:julian_dev
Open

style: make_units, launch_runs#29
7PintsOfCherryGarcia wants to merge 6 commits into
GeoGenetics:mainfrom
7PintsOfCherryGarcia:julian_dev

Conversation

@7PintsOfCherryGarcia

Copy link
Copy Markdown

This pull request makes several improvements and refactors to the make_units.py and launch_runs.py scripts, focusing on workflow configuration flexibility, error handling, and robustness in metadata processing. The changes enhance user experience, make the scripts more adaptable to different environments, and improve maintainability.

Major improvements and new features:

Workflow configuration and flexibility:

  • Refactored launch_runs.py to add new arguments for workflow and environment selection (--pixi-path, --workflow-path, --scheduler, --partition, --account, --jobs). The script now builds commands more flexibly based on these options, supporting both local and Slurm-based execution and allowing users to override paths and scheduler settings. [1] [2] [3]
  • Reworked how workflow and environment paths are determined, removing hardcoded host detection and supporting user overrides via command-line arguments.

Metadata and output handling:

  • In make_units.py, added logic to inject default values for missing metadata fields required by output path formatting, ensuring that all required fields are present and logging warnings when defaults are used. [1] [2]
  • Improved the extraction of workflow_ver (workflow version): now prefers GitPython but gracefully falls back to using git describe via subprocess if unavailable, making the script more robust and portable.

Error handling and user feedback:

  • Enhanced error reporting for missing extra files in make_units.py by using parser.error with a clear message instead of an assertion.
  • Improved logging output for DataFrames and debug information, providing more readable and informative logs. [1] [2] [3]

Other improvements:

  • Updated the default input and metadata regex patterns to better match expected file naming conventions.
  • Changed NumPy print options to use a more recent legacy version for compatibility.
  • Ensured that output file writing is skipped if --dryrun is set, making dry runs safer and clearer.

These changes collectively make the scripts more user-friendly, configurable, and robust for diverse workflow execution environments.

…endencies

add fallback defaults for missing out-path wildcards (project, library, flowcell_pos, flowcell, lane, date, workflow_ver, extra_file_md5, sample)
warn when out-path fields are missing from parsed metadata and defaults are applied
normalize date after metadata injection and replace invalid dates with today to keep formatting stable
make workflow version detection resilient when GitPython is unavailable by falling back to git describe (and finally unknown)
replace assert on missing --extra-file with argparse parser error for a clean, user-friendly failure message
@7PintsOfCherryGarcia 7PintsOfCherryGarcia changed the title Julian dev style - make_units, launch_runs Jun 1, 2026
@7PintsOfCherryGarcia 7PintsOfCherryGarcia changed the title style - make_units, launch_runs style: make_units, launch_runs Jun 1, 2026
Comment thread scripts/make_units.py Outdated
Comment thread scripts/make_units.py Outdated
Comment thread scripts/make_units.py Outdated
Comment thread scripts/launch_runs.py Outdated
@fgvieira

fgvieira commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

It might also be nice to support running snakemake locally, but submit jobs to slurm.

@7PintsOfCherryGarcia

Copy link
Copy Markdown
Author

Good point. I agree that running Snakemake locally while submitting jobs to Slurm would be useful. I’d prefer to keep this PR focused, so I can follow up with a separate change for that if that sounds good

Comment thread scripts/make_units.py Outdated
Comment thread scripts/make_units.py Outdated
Comment thread scripts/launch_runs.py Outdated
Comment thread scripts/make_units.py Outdated
Comment thread scripts/make_units.py Outdated
Comment thread scripts/launch_runs.py Outdated
Comment thread scripts/launch_runs.py
f"Jobs will be submitted to the {args.run} HPC, on account '{hpc_job_account}' and partition '{hpc_job_partition}'."
)
extra_args += (
f" --executor {args.run} --default-resources slurm_account={hpc_job_account} slurm_partition={hpc_job_partition}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is not kept.

Comment thread scripts/make_units.py Outdated
Comment thread scripts/launch_runs.py
logging.info(f"Running jobs locally")
elif args.run == "slurm":
logging.info(
f"Jobs will be submitted to the {args.run} HPC, on account '{hpc_job_account}' and partition '{hpc_job_partition}'."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hpc_job_account and hpc_job_partition are not defined.

Comment thread scripts/make_units.py
out_path.mkdir(parents=True, exist_ok=args.force)
# Save units.tsv file
units.drop(["extra_file_md5"], axis=1).dropna(axis=1, how="all").to_csv(
units.dropna(axis=1, how="all").to_csv(

@fgvieira fgvieira Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why include the config md5 in units.tsv?

Comment thread scripts/make_units.py
logging.warning(
"Some date values could not be parsed. Using today's date for invalid rows."
)
units.loc[units["date"].isna(), "date"] = out_path_defaults["date"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit dangerous.

Suggested change
units.loc[units["date"].isna(), "date"] = out_path_defaults["date"]

Comment thread scripts/make_units.py
Comment on lines +271 to +277
missing_out_path_fields = [
key for key in out_path_wildcards if key not in units.columns.values
]
if missing_out_path_fields:
logging.warning(
"Missing out-path fields in parsed metadata: %s. Applying defaults.",
", ".join(sorted(missing_out_path_fields)),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won;t this clash with lines 263-268?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants