Fix Fleet API protection bugs and add two-phase secret deletion#1
Open
prashanthd409 wants to merge 5 commits intomainfrom
Open
Fix Fleet API protection bugs and add two-phase secret deletion#1prashanthd409 wants to merge 5 commits intomainfrom
prashanthd409 wants to merge 5 commits intomainfrom
Conversation
During Redis restarts or maintenance, Fleet API occasionally returns incomplete responses, causing ArgoCD to delete applications. Solution adds three-layer protection: - Detection: Identifies transient issues by pattern analysis - Retry: 3 attempts with exponential backoff - Cache: Falls back to last known good response Changes: - Add protection/ package for detection and caching - Update fleetclient.go with retry logic - Add configuration via environment variables - Add comprehensive test coverage (29 tests passing) Fixes application deletions during Redis HA failover
9725aa5 to
4e02c1b
Compare
Fix fallthrough bug where suspicious data was cached when transient detection fired with an expired cache, fix data race on map caches, add two-phase deletion with grace period to prevent secret loss from transient Fleet API partial responses, and add race-safe tests.
Pin all action references to full-length commit SHAs to prevent supply-chain attacks via mutable tags. Add dependabot.yml for automated dependency updates on Go modules and GitHub Actions.
danbustillos-ab
approved these changes
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes critical bugs in the Fleet API transient issue protection and adds a two-phase deletion mechanism with a configurable grace period, preventing ArgoCD cluster secrets from being deleted when the Fleet API temporarily returns partial data.
Changes
Fixes the fallthrough bug where suspicious data was cached, adds mutex protection for concurrent map access, introduces annotation-based two-phase deletion with a 60s grace period, replaces
time.Sleepwith context-aware retries, reuses the Kubernetes clientset, and adds race-safe tests including an incident replay scenario.Test plan
go test -v -race -count=1 ./...— all tests pass, race detector cleango vet ./...— no issues-raceDetailed changes
Bug fixes
fleetclient.go): When transient issue detected on final retry with expired cache, execution fell through to cache and return suspicious data. Now returns an error.fleetclient.go):MembershipTenancyMapCacheandScopeTenancyMapCachewritten byRefresh()with no lock while read by HTTP handlers. Addedsync.RWMutex.main.go): AfterPluginResultserror, HTTP handler wrote 500 but continued to write a second response.Safety features
fleetclient.go):pruneSecretsno longer deletes immediately. Absent secrets getfleet.gke.io/absent-sinceannotation; only deleted after configurable grace period (default 60s). Recovered memberships get annotation removed.fleetclient.go):time.Sleepreplaced withselectontime.After/ctx.Done()so retries respect context cancellation.fleetclient.go):startReconcilenow usestime.Ticker+ctx.Done()instead of infinitefor/time.Sleep.Performance
fleetclient.go): Created once inNewFleetSync()instead of every reconciliation cycle.fleetclient.go): Package-leveltemplate.Mustinstead of per-iteration parsing.CI & config
golangci-lint.yml): Addedgo test -v -race -count=1 ./...step.applicationset-demo.yaml): Updated toapplicationsSync: create-updateandpreserveResourcesOnDeletion: true.DELETION_GRACE_PERIOD_SECONDS(default 60).Backward compatibility
fleet.gke.io/absent-sinceannotation — clean rollback