feat: Add dagrun status progress to backfills#64089
feat: Add dagrun status progress to backfills#64089shivaam wants to merge 2 commits intoapache:mainfrom
Conversation
| updated_at: datetime | ||
| dag_display_name: str = Field(validation_alias=AliasPath("dag_model", "dag_display_name")) | ||
| num_runs: int = 0 | ||
| dag_run_state_counts: dict[str, int] = {} |
There was a problem hiding this comment.
This probably should not be = {}
There was a problem hiding this comment.
Thanks for the review! I used = {} to match the existing pattern in the same file Happy to change if you'd prefer a different approach, did you have something specific in mind?
Also curious about your thoughts on the broader design question: would you prefer these fields on a separate endpoint (similar to dagStats) rather than enriching BackfillResponse? I am thinking I should create a new API that lists all the dagRuns associated with a backfill instead
There was a problem hiding this comment.
I guess it depends on whether other information (on dag runs) would be potentially useful. Attaching them on the same model is better if it’s just the count and state. (Not necessarily enriching BackfillResponse directly though; arguably this should be a different Pydantic model.)
There was a problem hiding this comment.
Pull request overview
Adds aggregated DagRun-state progress data to backfills and displays it in the UI (banner and backfills list) as a segmented progress bar / completion indicator.
Changes:
- Extend
BackfillResponsewithnum_runsanddag_run_state_countsand enrich multiple backfill endpoints via a grouped aggregate query. - Add a reusable
BackfillProgressBarcomponent and wire it into the Dag Backfills page and the backfill banner. - Update OpenAPI-generated artifacts and adjust/add unit tests (including query-count assertions).
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| airflow-ctl/src/airflowctl/api/datamodels/generated.py | Updates generated client-side Pydantic models to include new backfill fields. |
| airflow-core/src/airflow/api_fastapi/core_api/datamodels/backfills.py | Adds new response fields to the Backfill Pydantic model. |
| airflow-core/src/airflow/api_fastapi/core_api/routes/public/backfills.py | Adds _enrich_backfill_responses() and applies it to public backfill endpoints. |
| airflow-core/src/airflow/api_fastapi/core_api/routes/ui/backfills.py | Applies enrichment to UI backfill listing endpoint. |
| airflow-core/src/airflow/api_fastapi/core_api/openapi/v2-rest-api-generated.yaml | Updates generated OpenAPI schema for BackfillResponse. |
| airflow-core/src/airflow/api_fastapi/core_api/openapi/_private_ui.yaml | Updates generated UI OpenAPI schema for BackfillResponse. |
| airflow-core/src/airflow/ui/openapi-gen/requests/types.gen.ts | Updates generated TS types for the new backfill response fields. |
| airflow-core/src/airflow/ui/openapi-gen/requests/schemas.gen.ts | Updates generated TS schema metadata for the new fields. |
| airflow-core/src/airflow/ui/src/components/BackfillProgressBar.tsx | Introduces the segmented progress bar UI component. |
| airflow-core/src/airflow/ui/src/components/Banner/BackfillBanner.tsx | Replaces indeterminate progress with backfill progress segments/count when available. |
| airflow-core/src/airflow/ui/src/pages/Dag/Backfills/Backfills.tsx | Adds a “Progress” column rendering progress or “Completed”. |
| airflow-core/src/airflow/ui/public/i18n/locales/en/common.json | Adds completed and table.progress i18n strings. |
| airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_backfills.py | Updates expected responses; adds a test for mixed DagRun state counts. |
| airflow-core/tests/unit/api_fastapi/core_api/routes/ui/test_backfills.py | Updates expected responses and query-count assertions for UI backfills listing. |
| from airflow.api_fastapi.core_api.openapi.exceptions import ( | ||
| create_openapi_http_exception_doc, | ||
| ) | ||
| from airflow.api_fastapi.core_api.routes.public.backfills import _enrich_backfill_responses |
There was a problem hiding this comment.
The UI router is importing _enrich_backfill_responses from the public router module. Sharing logic via a private helper in another router creates a brittle cross-route dependency (and makes refactors/routing changes riskier). Consider moving the enrichment function into a shared module (e.g. a services/backfills.py or api_fastapi/core_api/common/backfills.py) and importing it from both routers.
| export const BackfillProgressBar = ({ stateCounts, total, trackColor = "bg.emphasized" }: Props) => { | ||
| const successCount = stateCounts.success ?? 0; | ||
| const failedCount = stateCounts.failed ?? 0; | ||
| const successPct = (successCount / total) * 100; | ||
| const failedPct = (failedCount / total) * 100; | ||
| const remainingPct = 100 - successPct - failedPct; | ||
|
|
There was a problem hiding this comment.
successPct/failedPct divide by total without guarding against total === 0. Current call sites check for total === 0, but this component is exported and could be reused elsewhere; with total=0 it will yield Infinity/NaN widths. Add an internal guard (e.g., treat total <= 0 as 0%/0%/100% and render 0/0).
| type: integer | ||
| type: object | ||
| title: Dag Run State Counts | ||
| default: {} |
There was a problem hiding this comment.
This PR updates v2-rest-api-generated.yaml directly. In Airflow these OpenAPI specs are generated artifacts; please ensure they’re produced via the repo’s OpenAPI generation workflow rather than hand-edited, so the change is reproducible and stays in sync with the source models.
| default: {} |
| dag_display_name: string; | ||
| num_runs?: number; | ||
| dag_run_state_counts?: { | ||
| [key: string]: (number); |
There was a problem hiding this comment.
This file is under openapi-gen/ and appears to be generated. Please confirm it was regenerated using the project’s OpenAPI client generation tooling (not manually edited), and consider documenting the regen command in the PR description if it’s not already included.
| [key: string]: (number); | |
| [key: string]: number; |
| num_runs: | ||
| type: integer | ||
| title: Num Runs | ||
| default: 0 | ||
| dag_run_state_counts: | ||
| additionalProperties: | ||
| type: integer | ||
| type: object | ||
| title: Dag Run State Counts | ||
| default: {} |
There was a problem hiding this comment.
This PR updates _private_ui.yaml (an OpenAPI artifact) directly. Please ensure this file is regenerated via the repo’s OpenAPI generation process rather than hand-edited, to keep generated specs consistent and reproducible.
| num_runs: Annotated[int | None, Field(title="Num Runs")] = 0 | ||
| dag_run_state_counts: Annotated[dict[str, int] | None, Field(title="Dag Run State Counts")] = {} |
There was a problem hiding this comment.
In this generated model, num_runs/dag_run_state_counts are typed as optional but default to non-None values, and dag_run_state_counts uses a mutable default {}. This pattern is repeated elsewhere in this file (e.g. dag_run_conf = {}), and can lead to confusing typing and shared mutable state. If the generator can be configured, prefer generating non-optional fields with default_factory for dict/list defaults to avoid these issues.
| dag_display_name: str = Field(validation_alias=AliasPath("dag_model", "dag_display_name")) | ||
| num_runs: int = 0 | ||
| dag_run_state_counts: dict[str, int] = {} |
There was a problem hiding this comment.
dag_run_state_counts is initialized with a mutable default {}. This can lead to shared state between BackfillResponse instances if the dict is ever mutated. Use Field(default_factory=dict) for this field (and similarly consider BackfillPostBody.dag_run_conf: dict = {} which has the same issue).
| rows = session.execute( | ||
| select( | ||
| BackfillDagRun.backfill_id, | ||
| DagRun.state, | ||
| func.count().label("count"), | ||
| ) | ||
| .join(DagRun, BackfillDagRun.dag_run_id == DagRun.id) | ||
| .where( | ||
| BackfillDagRun.backfill_id.in_(ids), | ||
| DagRun.backfill_id == BackfillDagRun.backfill_id, | ||
| ) | ||
| .group_by(BackfillDagRun.backfill_id, DagRun.state) | ||
| ).all() | ||
| counts: dict[int, dict[str, int]] = defaultdict(dict) | ||
| num_runs: dict[int, int] = defaultdict(int) | ||
| for backfill_id, state, count in rows: | ||
| counts[backfill_id][state] = count | ||
| num_runs[backfill_id] += count |
There was a problem hiding this comment.
DagRun.state is nullable (stored in _state with nullable=True), so this aggregation can produce a NULL state group. That would populate dag_run_state_counts with a None key (and potentially break JSON serialization or UI expectations). Consider filtering out NULL states (DagRun.state.is_not(None)) or mapping them to a stable string using func.coalesce in the SELECT.
…ed module, filter NULL states, remove UI for phase 2
2c9d95b to
afc67bc
Compare
|
Closing in favor of a new approach: a dedicated Backup of the full work (including UI progress bar) is at |
Adds progress tracking to the backfill UI — the banner and Backfills list page now
show a progress bar (green for success, red for failed, gray for remaining) with a
completion count (e.g., "3/6"). Completed backfills show "Completed" text.
Changes
Backend:
num_runsanddag_run_state_countstoBackfillResponse(Open question: I think it might be better to create a new API instead of changing the existing API. See open questions below as well.backfill_dag_run+dag_run, GROUP BY state)num_runsderived by summing state counts (inner join excludes NULL dag_run_id rows)Frontend:
BackfillProgressBarshared component (used by both banner and table)Tests:
test_get_backfill_with_dag_run_state_countscovering mixed statesOpen design questions — requesting reviewer input
Should this be a new API endpoint instead of enriching BackfillResponse?
I considered a new APU
GET /backfills/{id}/dagRunsreturning the linked dag runs with state,letting the UI compute counts client-side. Reasons it may be better:
BackfillResponsestable (no new fields on the public v2 API)airflow backfill statususe caseThe current approach (enriching BackfillResponse) was simpler to ship but couples
progress data to every backfill fetch. Happy to refactor if reviewers prefer a
separate endpoint.
Other questions for reviewers:
Should
num_runsinclude skippedBackfillDagRunrows? Currently it onlycounts rows with an actual DagRun (inner join). Skipped slots are excluded as they will never run so counting them in completion does not make sense.
Should completed backfills show final stats? Currently shows "Completed" text
and hides the bar. A backfill that finished 95 success + 5 failed looks the same
as 100 success. Showing the final breakdown would be more informative but the
underlying data has a limitation: when a newer backfill reprocesses the same dates,
DagRun.backfill_idgets reassigned, so old backfills lose their state counts."Completed" text avoids exposing this stale data.
Should running/queued be visually distinct? The bar currently has three
segments: green (success), red (failed), gray (remaining). Running and queued are
lumped into gray. Splitting them out adds information but also visual complexity. I tried that but the UI becomes harder to read.
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (claude-opus-4-6) following the guidelines