Skip to content

improvement(gmail): replace custom html-to-text regex with library#4613

Merged
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/ottawa-v5
May 15, 2026
Merged

improvement(gmail): replace custom html-to-text regex with library#4613
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/ottawa-v5

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

Type of Change

  • Improvement

Testing

  • 18 existing tests in `utils.test.ts` pass (2 updated for the library's whitespace handling)

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…xt library

Resolves 4 CodeQL alerts on htmlToPlainText (incomplete tag/entity handling,
unsafe regex backtracking). Delegates to the html-to-text npm package already
used by the outlook polling trigger and the mail/send route.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 15, 2026 2:44am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 15, 2026

PR Summary

Medium Risk
Changes the plain-text fallback generation for HTML Gmail messages, which can subtly affect formatting/whitespace and link rendering in outgoing emails. Risk is mitigated by targeted selector configuration and updated unit tests but still impacts user-visible email content.

Overview
Replaces the custom regex-based htmlToPlainText implementation in apps/sim/tools/gmail/utils.ts with html-to-text’s convert() for more robust HTML stripping and entity decoding.

Configures conversion behavior (no word-wrapping, skip img/script/style, and suppress redundant anchor URLs / bare # anchors) and updates utils.test.ts expectations to match the library’s newline and NBSP handling, adding coverage for link elision and   fidelity.

Reviewed by Cursor Bugbot for commit 7cbf02a. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 15, 2026

Greptile Summary

Replaces the custom regex-based htmlToPlainText in apps/sim/tools/gmail/utils.ts with the html-to-text npm package (already a repo dependency), resolving CodeQL alerts around incomplete entity handling and regex backtracking.

  • utils.ts: Swaps 12 lines of chained .replace() calls for a single convert() call with a selectors config that matches the pattern already used in outlook.ts (suppress link URLs, skip img/script/style).
  • utils.test.ts: Updates two assertions to match the library's paragraph-separation behavior, adds explicit coverage for   preservation as U+00A0, and adds anchor-URL-elision assertions.

Confidence Score: 5/5

Safe to merge — the change is a straightforward library substitution with no new dependencies, matching an existing pattern in the repo.

The swap touches only one function in one file, reuses a library and configuration already proven elsewhere in the codebase, and is backed by an updated test suite that covers the behavioral differences introduced by the library (paragraph spacing,   handling, anchor URL suppression).

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/tools/gmail/utils.ts Replaces regex-based htmlToPlainText with html-to-text's convert(); configuration mirrors the existing outlook.ts usage with anchor/img/script/style selectors.
apps/sim/tools/gmail/utils.test.ts Two assertions updated for library whitespace semantics; new tests added for   preservation and anchor URL suppression.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["htmlToPlainText(html)"] --> B["convert(html, options)\nhtml-to-text library"]
    B --> C["wordwrap: false"]
    B --> D["selectors config"]
    D --> E["a: hideLinkHrefIfSameAsText\n+ noAnchorUrl"]
    D --> F["img: skip"]
    D --> G["script: skip"]
    D --> H["style: skip"]
    B --> I["Plain-text string\n(trimmed, entities decoded)"]
Loading

Reviews (2): Last reviewed commit: "improvement(gmail): match outlook select..." | Re-trigger Greptile

Comment thread apps/sim/tools/gmail/utils.ts Outdated
Comment thread apps/sim/tools/gmail/utils.test.ts
…ests

Aligns html-to-text options with apps/sim/lib/webhooks/polling/outlook.ts:
suppress anchor hrefs when identical to text, drop bare # anchors, skip
img/script/style content. Adds tests for nbsp preservation and anchor
behavior.
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 7cbf02a. Configure here.

@waleedlatif1 waleedlatif1 merged commit 62636c7 into staging May 15, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/ottawa-v5 branch May 15, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant