Add error_message field to bundle deploy telemetry#4793
Add error_message field to bundle deploy telemetry#4793shreyas-goenka wants to merge 14 commits intomainfrom
Conversation
c7ee5c8 to
cb59f1d
Compare
|
Commit: 968e9fb
19 interesting tests: 10 SKIP, 7 RECOVERED, 2 flaky
Top 20 slowest tests (at least 2 minutes):
|
Track the first error diagnostic summary encountered during bundle deploy in telemetry. Move telemetry logging into a defer so it's always captured, even when deploy fails. Co-authored-by: Isaac
cb59f1d to
1a8ff40
Compare
Suggested reviewersBased on git history of the changed files, these people are best suited to review:
Confidence: high Eligible reviewersBased on CODEOWNERS, these people or teams could also review: @andrewnester, @anton-107, @pietern, @simonfaltum Suggestions based on git history of 5 changed files (5 scored). See CODEOWNERS for path-specific ownership rules. |
simonfaltum
left a comment
There was a problem hiding this comment.
Review Swarm Summary (2 independent reviewers + cross-review)
Verdict: REQUEST CHANGES - The defer-based approach is correct in principle, but has 3 issues that need attention before merge.
- [IMPORTANT] Potential panic on auth/config failures - The deferred
LogDeployTelemetry()runs wheneverb != nil, even beforecmdctx.SetConfigUsed()succeeds. On auth/config failures, telemetry upload will callcmdctx.ConfigUsed(ctx)and panic. - [IMPORTANT] Double deploy events on failure - Failed deploys will emit two
BundleDeployEvents: one from the new defer and one from the root-level fallback incmd/root/root.go:185. - [IMPORTANT] PII risk in error messages -
ErrorMessageis sent verbatim with no sanitization or size bound. Many deploy errors interpolate local filesystem paths, resource names, and workspace URLs.
| // Log deploy telemetry on all exit paths. This is a defer to ensure | ||
| // telemetry is logged even when the deploy command fails, for both | ||
| // diagnostic errors and regular Go errors. | ||
| if opts.Deploy { | ||
| defer func() { | ||
| if b == nil { | ||
| return | ||
| } | ||
| errMsg := logdiag.GetFirstErrorSummary(ctx) | ||
| if errMsg == "" && retErr != nil && !errors.Is(retErr, root.ErrAlreadyPrinted) { | ||
| errMsg = retErr.Error() | ||
| } | ||
| phases.LogDeployTelemetry(ctx, b, errMsg) | ||
| }() |
There was a problem hiding this comment.
[IMPORTANT] This defer runs whenever b != nil, but cmdctx.SetConfigUsed() may not have been called yet (e.g., if configureBundle fails on auth/profile errors before reaching SetConfigUsed in cmd/root/bundle.go:187). When telemetry upload later calls cmdctx.ConfigUsed(ctx) in libs/telemetry/logger.go, it will panic.
Fix: guard with if !cmdctx.HasConfigUsed(ctx) { return } inside the defer, or move the defer setup to after SetConfigUsed() has succeeded.
| // Log deploy telemetry on all exit paths. This is a defer to ensure | ||
| // telemetry is logged even when the deploy command fails, for both | ||
| // diagnostic errors and regular Go errors. | ||
| if opts.Deploy { | ||
| defer func() { | ||
| if b == nil { | ||
| return | ||
| } | ||
| errMsg := logdiag.GetFirstErrorSummary(ctx) | ||
| if errMsg == "" && retErr != nil && !errors.Is(retErr, root.ErrAlreadyPrinted) { | ||
| errMsg = retErr.Error() | ||
| } | ||
| phases.LogDeployTelemetry(ctx, b, errMsg) | ||
| }() |
There was a problem hiding this comment.
[IMPORTANT] This defer will now log a full BundleDeployEvent on deploy failure. But cmd/root/root.go:182-189 still appends a legacy empty BundleDeployEvent on every nonzero bundle_deploy exit. Result: two deploy events per failed deploy, which will skew failure counts and error-rate dashboards.
Fix: remove the root-level failure fallback, or gate it on "no deploy event was already logged".
| BundleDeployEvent: &protos.BundleDeployEvent{ | ||
| BundleUuid: bundleUuid, | ||
| DeploymentId: b.Metrics.DeploymentId.String(), | ||
| ErrorMessage: errMsg, |
There was a problem hiding this comment.
[IMPORTANT] ErrorMessage is sent verbatim from logdiag.GetFirstErrorSummary() / retErr.Error(). Many deploy errors interpolate local filesystem paths or user-controlled config values (e.g., from statemgmt/state_pull.go, config/mutator/translate_paths.go). This starts shipping raw PII/workspace details to telemetry with no sanitization or size bound.
Fix: emit a sanitized error code/category, or at least scrub paths and cap length (e.g., 500 chars).
| if b == nil { | ||
| return | ||
| } | ||
| errMsg := logdiag.GetFirstErrorSummary(ctx) | ||
| if errMsg == "" && retErr != nil && !errors.Is(retErr, root.ErrAlreadyPrinted) { | ||
| errMsg = retErr.Error() |
There was a problem hiding this comment.
[SUGGESTION] No new unit tests for the defer + error capture logic. The core behavior change (telemetry always fires, error message captured from logdiag or retErr) should have test coverage. Consider testing:
- Telemetry fires on deploy failure with error message
ErrAlreadyPrintederrors fall through toGetFirstErrorSummary- Successful deploy passes empty error message
Two fixes: 1. Remove redundant empty BundleDeployEvent logging in root.go Execute. The defer in process.go already logs detailed deploy telemetry via LogDeployTelemetry on all exit paths. The root.go code caused duplicate events and Upload retries (mock returns numProtoSuccess=1). 2. Add HasConfigUsed guard in telemetry.Upload before calling ConfigUsed. When bundle config fails early (e.g., restricted script execution blocking preinit), ConfigUsed is never set on the context. The LogDeployTelemetry defer still logs an event, so Upload would panic trying to create an API client without auth config. Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
93f4e62 to
a2fb5dd
Compare
426f35a to
e50f480
Compare
e50f480 to
b6d60a8
Compare
b6d60a8 to
36d67ed
Compare
36d67ed to
7b76e5b
Compare
Adds a scrubber that runs before error messages are sent to telemetry: 1. Replaces the bundle root path with "." to avoid leaking local paths 2. Redacts remaining home directory paths (/Users/..., /home/..., C:\Users\...) 3. Redacts email addresses (e.g., in workspace paths) Inspired by VS Code's telemetry path scrubbing and Sentry's @userpath rule. Co-authored-by: Isaac
7b76e5b to
ca229c4
Compare
… path redaction - Replace windowsHomeDirRegexp with general windowsAbsPathRegexp - Add workspacePathRegexp for [REDACTED_WORKSPACE_PATH] label - Use capturing groups instead of replaceDelimitedMatch helper - Merge HomeDirRegexFallback tests into AbsolutePaths test Co-authored-by: Isaac
… test Co-authored-by: Isaac
The replacePath function converted bundle root to "." and home dir to "~" before regex scrubbing. Since absolute and relative paths are handled by separate regex patterns anyway, this added complexity for no benefit. Absolute paths now consistently get [REDACTED_PATH]. Co-authored-by: Isaac
~/... expands to an absolute path, so it should match [REDACTED_PATH] rather than [REDACTED_REL_PATH]. Co-authored-by: Isaac
Redacted paths now include the file extension for a known set of types, e.g. [REDACTED_PATH](yml), [REDACTED_REL_PATH](json). This helps with debugging without leaking sensitive information. Co-authored-by: Isaac
Add .r, .scala, .sh, .hcl, .ini, .zip, .tar, .csv for better coverage of file types common in the Databricks CLI ecosystem. Co-authored-by: Isaac
Add .js, .ts, .jsx, .tsx, .html, .css (Databricks Apps / Vite), .env, and .md. Co-authored-by: Isaac
Add .xml, .properties, .conf, .tfstate, .tfvars, .egg, .gz, .tgz, .dbc, .parquet, .avro, .log, .lock, .pem, .crt. Co-authored-by: Isaac
Changes
Adds support for logging error messages encountered during
bundle deployin telemetry. This gives the developer ecosystem team visibility into user-facing errors.What changed:
libs/telemetry/protos/bundle_deploy.go: AddedErrorMessagefield toBundleDeployEventstructlibs/logdiag/logdiag.go: AddedFirstErrorSummaryfield toLogDiagDatato capture the summary of the first error diagnostic. AddedGetFirstErrorSummary()gettercmd/bundle/utils/process.go: MovedLogDeployTelemetryinto adeferso telemetry is always logged, even when the deploy command fails, for both diagnostic errors and regular Go errorsbundle/phases/telemetry.go: PopulatesErrorMessagefromlogdiag.GetFirstErrorSummary(ctx). Error messages are capped at 500 charactersbundle/phases/telemetry_scrub.go: Best-effort regex scrubber that runs before telemetry upload. The error message is treated as PII by the logging infrastructure; we scrub to avoid collecting more information than necessary. Scrubs:[REDACTED_PATH]/Workspace/...) →[REDACTED_WORKSPACE_PATH][REDACTED_PATH]./...,../...,.databricks/...,~/...) →[REDACTED_REL_PATH]resources/file.yml) →[REDACTED_REL_PATH][REDACTED_EMAIL]@userpathrulelibs/telemetry/logger.go: AddedHasConfigUsedguard before accessing auth config inUpload, preventing panics when config setup fails earlyTests:
bundle/phases/telemetry_scrub_test.gobundle/telemetry/deploy-error-messageverifies the error message field is populated and scrubbed in telemetry