Skip to content

Add interactive REPL mode + fix screenshot crawl#19

Open
sahilsunny wants to merge 15 commits into
mainfrom
REPL
Open

Add interactive REPL mode + fix screenshot crawl#19
sahilsunny wants to merge 15 commits into
mainfrom
REPL

Conversation

@sahilsunny
Copy link
Copy Markdown
Collaborator

Summary

  • Add an interactive REPL mode (scrapingbee with no subcommand) — full-screen UI with banner, fixed live status widget, virtual scrollback, click-to-open paths, multi-line paste preview, and Ctrl+C-safe command interruption.
  • Fix screenshot crawl: --max-pages N --screenshot-full-page true saved only 1 PNG in v1.4.1 (verified against pristine). Now produces exactly N files. Three stacked bugs fixed — _requires_discovery_phase was missing screenshot_full_page/screenshot_selector; scrapy_scrapingbee's default errback crashed on binary 500 responses; the scheduler's LIFO ordering pushed save requests behind a growing pile of follow-discoveries.
  • Pool-based discovery for binary modes: instead of paying HTML discovery + binary save per page (~2× credits), we discover until the pool has ≥ max_pages URLs and then batch-dispatch all saves. ~50% credit reduction on link-rich sites at --max-pages 100.
  • --max-pages N now means N SAVED pages (was N total responses). Backfill on save failure so flaky 5xx errors don't silently shrink the user's budget.
  • --flag true/false syntax accepted across every boolean flag for consistency with the scraping-side options (--render-js true, …). Bare --verbose still works. Applied at both CLI entry and REPL dispatch via an argv preprocessor.
  • commands/crawl.py concurrency warning now shows the underlying error (e.g. HTTP 429) so users can tell rate-limit hiccups from auth/network problems.

CLI behaviour outside the REPL is otherwise preserved — see the most recent commit message for a hunk-by-hunk explanation of each non-REPL change.

Test plan

  • --help exits 0 for all 19 subcommands
  • CLI free reads — usage, docs, auth --show, unsafe --list, schedule --list
  • CLI single-call API — scrape, fast-search, google, chatgpt, amazon-product, amazon-search, walmart-product, walmart-search, youtube-search, youtube-metadata
  • CLI batch — scrape --input-file urls.txt --output-dir batch/
  • CLI export in ndjson / csv / txt formats
  • CLI plain HTML crawl --max-pages 3 produces exactly 3 files
  • CLI screenshot crawl --max-pages 3 --screenshot-full-page true produces exactly 3 PNGs (vs pristine v1.4.1 which produces 1)
  • CLI --save-pattern "blog" filters correctly
  • CLI unsafe-mode full gate: env var + verified flag required, whitelist enforcement, $()/backtick/pipe/&& injection blocking, audit log captures actions, unsafe --disable re-locks the gate
  • CLI --verbose true, --verbose false, bare --verbose all work; same for other bool flags
  • REPL full pass — 17 test scenarios covering: startup chrome, slash commands (:help/:set/:show/:list/:view), Tab completion (single-match inline + multi-match popup + ghost-text-word fallback), shell !cmd, single API call with preview, crawl status widget (banner shrinks, URL line, honeycomb), screenshot crawl bug fix verification (exactly N PNGs), multiple consecutive crawls in one session (subprocess-per-crawl), Ctrl+C SIGTERM→SIGKILL escalation, batch progress widget unified with crawl, :view smart routing + HTML pretty-print, unsafe-mode --on-complete hook, multi-line paste preview + edit-before-execute, click-to-open paths, exit/restart history persistence

sahilsunny added 13 commits May 7, 2026 13:40
Bare `scrapingbee` (no subcommand) now drops into a themed REPL with
tab completion, history, and inline command help.

Adds:
- src/scrapingbee_cli/interactive.py — REPL loop, splash, completer
- src/scrapingbee_cli/theme.py — ScrapingBee brand theme + spinner
- src/scrapingbee_cli/help_formatter.py — Rich-styled click help
- pyproject.toml: prompt_toolkit>=3.0, rich>=13.0

Hooks `cli.py` so the click group is invoke_without_command=True
and falls into run_repl() when no subcommand is given. Schedule
hint is suppressed inside the REPL to avoid per-command noise.

Phase 2 (theme integration in command files for inline spinners
during command runs) will follow as a separate commit.
Inside the REPL, commands now show a MiniBeeSpinner during the
API call, batches show a live honeycomb credit meter via
LiveCreditTracker, and verbose / completion output is rendered
with rich-styled helpers from theme.py.

Changes:
- batch.py: new _batch_done helper; honeycomb-trail progress;
  styled batch-start banner; LiveCreditTracker wrap around the
  batch run; usage_info kwarg on run_api_batch
- cli_utils.py: REPL-mode branches in _validate_range,
  check_api_response, scrape_with_escalation, and write_output
  verbose section
- client.py: parse_usage now exposes max_api_credit (needed by
  LiveCreditTracker)
- commands/{amazon,chatgpt,fast_search,google,walmart,youtube}.py:
  MiniBeeSpinner around single API calls; usage_info pass-through
  to run_api_batch
- commands/scrape.py: spinner around single scrape; LiveCreditTracker
  around batch; REPL-styled error on HTTP 4xx/5xx
- commands/usage.py: full styled dashboard (honeycomb meter,
  credits used/remaining/total, concurrency, renewal date) when
  invoked from the REPL; plain JSON kept for non-REPL
- commands/crawl.py: LiveCreditTracker wrap around run_urls_spider
  so credit drain during long crawls is visible

Plain (non-REPL) output is unchanged for every code path. All 653
existing unit tests still pass.
Bug fix: remove the outer REPL spinner that wrapped every command.
It blocked interactive commands (`tutorial`, `auth`) from prompting
the user, masked their output, and double-stacked with the inner
MiniBeeSpinner already added in Phase 2 for network commands.

Add `tutorial` and `unsafe` to the REPL command list and tab
completion (introduced in v1.4.0/v1.4.1, were missing).

Prompt: drop the Powerline-arrow protrusion in favour of a single
unified yellow tag — ` ScrapingBee ❯ ` — with the chevron inside
the tag. Renders identically in every terminal/font (Mac Terminal,
Warp, iTerm2, etc.) since it uses only standard BMP glyphs. Set
SCRAPINGBEE_POWERLINE=1 to opt back into the Powerline arrow if
you have a patched font (Nerd Font / Powerline-patched).
Treat the REPL as a tool, not a mascot. The previous version
prioritised personality (splash, ASCII logo, bee emoticons,
rotating fun facts, cute exit) over getting out of the user's
way. This rewrite swaps that for psql/redis-cli/gh-style
density and consistency.

interactive.py — full rewrite:
- Remove the bee splash animation, ASCII-art logos, repeated
  hint line on every prompt.
- One-line banner on startup, then prompt.
- Slash-prefixed REPL meta-commands (`:help`, `:q`, `:clear`,
  `:set`, `:unset`, `:show`) so they don't collide with click
  commands. Bare aliases (`help`, `exit`, `quit`, `q`, `clear`)
  still work for muscle memory.
- Per-command tab completion driven by walking the click tree
  at startup — `youtube-search --<TAB>` now shows YouTube flags,
  `scrape --<TAB>` shows scrape flags. Bool/Choice flags auto-
  detected from click param types (no more flat `_COMMON_FLAGS`
  list that drifts from reality).
- Uniform output frame around every command: `─── cmd ─── ` divider
  on top, `[ok]/[fail]   1.23s` line on the bottom.
- Bottom toolbar with live state: credits remaining (read from
  the existing usage cache), last command name + status + duration,
  active session settings.
- "Did you mean?" suggestions on unknown commands and on click
  "no such option" errors (Levenshtein distance, threshold 2).
- Multi-line input via trailing backslash continuation.
- Session settings via `:set country-code=fr`, applied as default
  flags to subsequent commands when not explicitly overridden.
- `:clear` uses standard `\033[2J\033[H` instead of the previous
  scroll-and-jump heuristic.
- Silent exit (no "Buzz off!" message).

theme.py:
- Replace MiniBeeSpinner's emoticon flap frames + rotating "Bee
  facts" + time-of-day flavour messages with a single line:
  ten braille-dot frames + the command name. Same API
  (`with MiniBeeSpinner("scrape"):`) — call sites unchanged.
- Drop dead module-level state: MESSAGES, _BEE_FACTS,
  _MSG_ROTATE_TICKS, _time_flavor.

All 653 unit tests still pass. SCRAPINGBEE_POWERLINE=1 still
opts into the protruding Powerline arrow for users with patched
fonts.
…eview

This is the working "non-TUI" iteration before switching to a true
full-screen TUI. Captures every fix in this round — keep it as a
checkpoint to fall back to if the TUI rewrite needs to be reverted.

interactive.py:
- Bordered input: dropped the Frame widget (rendering artifacts) and
  the horizontal rules (yellow trails on resize) — input is now just
  a chevron prompt + lexer-highlighted buffer + adaptive bottom toolbar.
- Tab completion: re-bind Tab/Shift-Tab/Esc on the custom KeyBindings
  (the previous version overrode prompt_toolkit defaults).
- erase_when_done=True on the Application + manual `❯ <cmd>` echo into
  scrollback after submit — fewer stale-render artifacts on resize.
- :set overhaul: validate keys against the click flag list, accept
  "k=v ..." and "--k v ..." mixed forms, suggest on typo, validate
  choice/bool values where known.
- :unset accepts space- or comma-separated keys; :unset *, :unset all,
  :reset all clear every setting.
- :view slash command — cross-platform pager built on prompt_toolkit
  (no `less` dependency on Windows). Arrow keys / PgUp/PgDn / Home/End
  / mouse wheel to scroll, q / Esc to exit.
- Toolbar adapts to width: chips truncate to "+N more" when narrow.
- Per-command tab completion driven by walking the click tree (already
  in the previous commit, retained).

theme.py:
- Hex bloom spinner: 3-cell radial composition (centre + halo) so the
  bloom radiates symmetrically instead of growing rightward. Frames
  cycle dust → speck → outline → honeycomb → ✦ sparkle peak → drain,
  paired with a dim→bright→warm colour gradient.
- White-glim shimmer sweeps across the verb ("Fetching", "Rendering")
  in time with the bloom.
- Elapsed-time counter once an op runs > 0.5s.
- Per-command verb rotation (no bee facts).

cli_utils.py:
- Output preview in REPL mode: large text dumps (>30 lines OR >4 KB)
  get truncated to a 30-line / 4 KB preview. Single-line minified HTML
  is detected by byte threshold so it doesn't slip through.
- Full payload auto-saved to ~/.cache/scrapingbee-cli/last-output so
  the user can :view / cat / less it.
- Binary output (PNG, PDF, etc.) is never truncated.
- Non-REPL invocations are unchanged so pipes/redirects keep working.

All 653 unit tests still pass.
Switches the REPL from prompt_toolkit's full_screen=False (inline) mode
to full_screen=True with an in-memory ScrollbackBuffer. Eliminates the
wrap-fragment / orphan-toolbar artifacts that bled into terminal
scrollback on resize, and gives us full control over rendering for
shimmer animations, mouse handling, and pagination.

Layout
- Pinned banner Window at the top: compact smblock "ScrapingBee" + version
  + tagline + ":help / :q" hint. Stays visible during long scrapes.
- Scrollback Window below the banner; spacer rows + horizontal separator
  between scrollback and the input area (Claude-CLI style).
- Toolbar at the bottom with paginated fields that rotate every 5s
  (Available Credits / Used Session / Concurrency / Next Update); the
  mode hint is pinned on every page so it's always visible.
- Running-state toolbar pins "running · Xs" on the left, rotates a
  stat in the middle (so credits consumed are visible during long
  crawls), and pins "Ctrl+C to stop" on the right.

Output handling
- ScrollbackBuffer + ScrollbackWriter pipe stdout / stderr / err_console
  through ANSI-parsing into an in-memory line list rendered by a
  scrollable Window. 10K-line ring buffer.
- Visual-row scroll (not logical-line): scroll_offset measured in
  terminal rows with width-aware line splitting, so long single-line
  output (huge JSON, etc.) scrolls one terminal row per wheel tick.
- Command echo splices into scrollback at the position where output
  started, on completion — no echo during execution (only the shimmer
  is the live indicator), echo appears right above output when done.

Input / interaction
- Mouse mode 1000 captures wheel/trackpad scroll; native drag-select
  still works because the terminal owns motion events. Tab toggles
  Scroll vs Select mode at runtime; toolbar hint shows current mode.
- Up/Down arrow keys navigate command history; explicit
  history.store_string() per submit since the custom Enter binding
  bypasses Buffer.validate_and_handle().
- Tab completion opens a popup via FloatContainer + CompletionsMenu
  (was silently entering completion state with no UI). Up/Down
  navigate, Enter picks, Esc dismisses.
- Pager (:view) wraps long lines, defaults to pretty-printed JSON
  with "r" to toggle raw, runs in a worker thread to avoid
  asyncio.run() conflict with the outer loop, re-enters alt buffer
  on exit so the outer REPL doesn't bleed into the main screen.
- Resize detection in the ticker triggers app.invalidate() so the
  layout adapts cleanly.

State / usage
- SessionState gains api_key_hash + per-session "used_credits_at_start"
  so re-auth with the same key preserves the session counter; a
  different key resets it.
- Background usage refresher polls /usage every 30s; "usage" command
  completion + auth completion trigger an immediate refresh via a
  thread-safe event.
- Banner shows "API key not set — type auth" when no key is configured.
- :help wrapping with a proper hanging indent (Text objects, not Rich
  markup, so leading whitespace is preserved); blank row between
  categories.

Crawl
- Skip Twisted signal-handler installation in REPL mode (signal.signal
  requires the main thread, but commands run in worker threads).
- Wire LOG_FILE to ~/.cache/scrapingbee-cli/crawl.log in REPL mode so
  the full crawl log is preserved beyond scrollback's MAX_LINES.
- Initialise usage_info to None before the batch-usage try block to
  prevent UnboundLocalError when the initial fetch raises.

Misc
- cli_utils: always overwrite the last-output cache for text responses
  in REPL mode (not just truncated ones) so :view never shows stale
  output from a previous command.
- Ctrl+C while running injects KeyboardInterrupt into the worker via
  PyThreadState_SetAsyncExc; surfaces as "stopped" in the footer.
- Reverted earlier experiments with Braille / PIL-rendered logos.
History navigation
- _submit now calls ``input_buffer.reset()`` instead of
  ``set_document(Document(""))`` so the history-navigation cursor
  (``working_index``) is also reset. Without this, after submitting a
  command the next Up press could continue browsing from wherever the
  user had last left off in history.
- Up handler synchronously loads history strings into ``_working_lines``
  when the buffer is fresh (len == 1). prompt_toolkit's
  ``load_history_if_not_yet_loaded`` schedules an *async* task that
  doesn't run before the first keypress, so without this the first Up
  after submit was a no-op and required two presses.
- Up handler also jumps ``working_index`` to the end when the buffer is
  empty after browsing, so Up restarts from the newest entry rather than
  walking further back from the previous browse position.

Esc latency
- Drop ``ttimeoutlen`` (parser-level escape-sequence wait, default 0.5s)
  to 0.05s on both the main REPL Application and the :view pager
  Application. Modern terminals deliver escape sequences as one read so
  50ms is plenty.
- Drop ``timeoutlen`` (key-processor multi-key-binding wait, default
  1.0s) to 0.05s on the pager — this was the main culprit behind the
  2-3 second Esc delay there.
- Bind ``escape`` in the pager with ``eager=True`` so it fires the
  moment the key processor sees it, bypassing partial-match search.

Both attributes are set on the Application instance after construction
because they aren't constructor parameters in this prompt_toolkit
version (passing them to __init__ raises TypeError).
…, fast Ctrl+C

- API key entry now lives inside the REPL UI: prompt flips to `API key:`
  with a masked input on startup or after `logout` / `auth`. No more
  pre-app getpass; no more `run_in_terminal` suspend/resume jolt.
- `!cmd` runs a shell command in a worker thread, gated by the existing
  unsafe-mode check. Output streams into scrollback; Ctrl+C terminates
  the child.
- Ctrl+C during a scrape stops in a frame instead of waiting for the
  HTTP request: tracks the worker's asyncio loop via a monkey-patched
  `asyncio.run` and cancels in-flight tasks via `call_soon_threadsafe`.
  CancelledError is caught alongside KeyboardInterrupt.
- Submitted command stays in the buffer if the run fails or is
  cancelled; only successful runs clear it.
- Batch progress: brand-yellow honeycomb hexes that fill as you go,
  with a shimmering boundary cell driven by the REPL's 10 Hz ticker.
  Single live-updating line via `replace_last_n_lines` instead of one
  appended row per completion. Usage credit meter mirrors the
  brand-yellow filled/outline palette.
- `:view` now also accepts `:view crawl` (alias for the crawl log) and
  `:view <path>` for arbitrary files. Meta-command echo is spliced
  ABOVE the meta's output, matching click-command echo order.
- History Up after submit no longer inverts oldest/newest order.
- `_validate_api_key` detects a running loop and offloads to a worker
  thread so REPL-mode `auth` no longer hits "asyncio.run cannot be
  called from a running event loop".
Replaces the compact smblock SCRAPING + stacked BEE block (10 logo
rows total) with a single 6-row "SCRAPING BEE" wordmark in ANSI
Shadow — yellow SCRAPING beside white BEE, mirroring the brand
wordmark. Same letterforms as the legacy logo, just stitched onto
one line of text instead of two so the banner takes less vertical
space.

SCRAPING rows are now padded to a uniform 62-column width so BEE
starts at the same column on every row. Without the padding G's
natural shape leaves a trailing space on rows 1, 2, 6 only — that
shifted BEE one column right on rows 3, 4, 5 and the bottom of B /
last E read as misaligned.
Best-effort XTERM Window Manipulation ("CSI 8 ; H ; W t") to bump the
window to 100 cols × 30 rows when the current size is below that. Fits
the 90-col banner with room for the toolbar + input. Only fires when
the window is actually too small, so users on a large terminal aren't
disrupted. Apple Terminal.app and SSH / tmux sessions ignore the
sequence and the REPL silently proceeds.
Release covers the REPL overhaul series: in-place API key prompt,
!shell exec, fast Ctrl+C cancellation, honeycomb batch progress with
shimmering boundary cell, single-line SCRAPING BEE banner, terminal
auto-resize at startup, and assorted polish.
…-flag consistency

Non-REPL changes (affect `scrapingbee crawl` and the CLI outside the REPL too):

  crawl.py — screenshot crawl actually produces N files for --max-pages N
    Pristine v1.4.1 with `--max-pages 5 --screenshot-full-page true` saved
    only 1 PNG. Side-by-side test against pristine confirmed three stacked
    upstream/spider bugs:
      1. `_requires_discovery_phase` only checked `screenshot`, missing
         `screenshot_full_page` and `screenshot_selector`. Those modes
         silently fell into the same-mode `parse()` path that runs link
         extraction on PNG bytes — yielding garbage URLs that crashed
         on dispatch.
      2. `scrapy_scrapingbee`'s default errback calls `response.text` on
         binary 500 responses → `AttributeError` → killed the spider.
         Every `ScrapingBeeRequest` is now wired to our `_on_request_error`
         which logs the URL and continues.
      3. The scheduler's LIFO ordering popped follow-discovery requests
         before the save requests yielded alongside them. With ~100
         follow URLs per page, saves were never dequeued before
         `CLOSESPIDER_PAGECOUNT` bailed. Fix: `priority=10` on save
         requests + raise `CloseSpider` from `_push_saved_status` when
         `_save_count >= max_pages`, so the engine drops the rest of
         the queue immediately.

  crawl.py — pool-based discovery (binary / extract modes)
    Old flow paid one HTML discovery + one save per saved page ≈ 2× credits.
    New flow accumulates URLs into `_save_queue` while discovering; once
    the pool reaches `max_pages` we flip `_discovery_done`, dispatch one
    save per pooled URL in priority order, and stop discovering. For a
    `--max-pages 100 --screenshot-full-page true` run on a link-rich site
    that previously cost ~1000 credits, this is closer to ~510.
    A `spider_idle` handler flushes the pool when the site is smaller
    than the cap so small-site crawls still produce output.

  crawl.py — `--max-pages N` now means N SAVED pages
    Replaces the older `_fetch_count` cap with `_save_count` +
    `_save_pending`. `--max-pages N` previously could stop early when
    discovery requests counted against the cap; now it counts only
    successful saves and matches the help text. Includes save-failure
    backfill from the queue so flaky 5xx errors don't silently shrink
    the user's effective budget.

  crawl.py — `errback` + non-printable URL filter on every yielded request
    `scrapy_scrapingbee`'s default errback is the binary-500 landmine
    above. Our `_on_request_error` is now attached to every
    `ScrapingBeeRequest` in the spider. Additionally, links whose
    decoded path/query contains non-ASCII bytes (common when discovery
    extracts hrefs from a corrupted PNG response on crawler-test.com
    fixtures) are dropped at iteration time so they can't trip the
    upstream errback in the first place.

  commands/crawl.py — concurrency-warning shows the actual reason
    Was: `Warning: could not check plan concurrency. Defaulting to 1…`
    Now: `Warning: could not check plan concurrency (HTTP 429). Defaulting…`
    The `/usage` endpoint is rate-limited; without the reason, users
    couldn't distinguish a transient 429 from a real auth/network
    problem and would default to concurrency=1 unnecessarily.

  cli.py + cli_utils.py — `--flag true|false` accepted for every bool flag
    Scraping-side options (`--render-js true`, `--premium-proxy true`, …)
    already took explicit `true`/`false` while Click flags (`--verbose`,
    `--resume`, `--escalate-proxy`, etc.) were bare-only — inconsistent
    UX. An argv preprocessor (`normalize_bool_flag_args`) collects all
    `is_flag=True` option names from the click tree and rewrites
    `--verbose true` → `--verbose`, `--verbose false` → (dropped, default
    applies). Bare `--verbose` still works. Applied at both `cli.main()`
    entry and REPL dispatch so behaviour matches everywhere.

REPL (new interactive mode — large but mostly self-contained):

  interactive.py / theme.py / batch.py / commands/*.py
  - Full-screen alt-buffer with banner, fixed status widget, virtual
    scrollback, bottom toolbar with live credit honeycomb.
  - Subprocess-per-crawl: Twisted's reactor is a process singleton and
    can't be reused; running crawls in a child process lets the REPL
    handle multiple consecutive crawls per session.
  - Crawl + batch share a unified fixed widget that shows banner-compact
    + honeycomb progress + URL line (crawl only). No more honeycomb
    rows leaking into scrollback.
  - Click-to-open paths: existing paths in scrollback are underlined
    brand-yellow; click opens in Finder / xdg-open / os.startfile.
    Detection handles paths with spaces, `:line:col` suffixes, and
    rejects URL `://path` false positives.
  - `:view` pager pretty-prints JSON (existing) and HTML (new via lxml);
    `r` toggles raw.
  - Multi-line paste preview: bracketed paste with newlines puts the
    pasted lines in a multi-line editable buffer (Up/Down navigate
    lines, Ctrl+J / Alt+Enter insert newline). Enter submits all,
    queueing rest via `_pending_commands`. Esc / Ctrl+C clear.
  - Tab completion: single-match inline-completes (bash-style), multi-
    match opens popup, ghost-text-word fallback when nothing to
    complete. Right accepts the next word of the ghost suggestion;
    End accepts the whole ghost suggestion.
  - Ctrl+C escalation: first press sends SIGTERM (graceful), second
    within 2 s sends SIGKILL — useful when Twisted is parked in a
    long screenshot fetch and SIGTERM lags.
  - Ctrl+R / Ctrl+S explicitly disabled (their default reverse-i-search
    writes into a hidden buffer we don't render — typing went to a
    black hole).
  - `auth --unsafe` intercepted in REPL with a "run outside" message;
    its multi-step disclaimer + masked-getpass fights our termios.
  - Bee facts list audited (9 corrected — Einstein quote, honey-as-
    sustenance myth, etc.) and rotation starts with a verb so quick
    commands don't flash trivia.

  commands/amazon.py, chatgpt.py, fast_search.py, google.py, scrape.py,
  usage.py, walmart.py, youtube.py — no net behavioural change vs main.
  The `LiveCreditTracker` / `MiniBeeSpinner` wrappers that were added
  earlier in the REPL branch have been removed (they were dead code),
  leaving only REPL-gated paths (`if is_repl_mode()` branches).
- Rename _CrawlerReactorAlreadyUsed → _CrawlerReactorAlreadyUsedError (N818)
- Drop CamelCase import aliases and lowercase in-function constants
  flagged by N806/N813/N814 across interactive.py, cli_utils.py, theme.py
- Route dynamic attribute access through getattr/setattr for twisted's
  reactor (callFromThread/stop), scrapy Spider._crawler, sys.stdout/err
  .buffer adapter install, and rich Console.file rebind so ty stops
  flagging unresolved-attribute / invalid-assignment
- Import lxml.etree / lxml.html via importlib so ty resolves the
  compiled submodules
- Pass loop_factory to asyncio.run via **kwargs (3.12+ signature) and
  install the wrapper via setattr to satisfy ty
- Guard sys.__stdout__ None case and tighten _set_text null-check
- Remove unused type:ignore comments and the now-unused shutil.get_
  terminal_size assignment in interactive.py
- Delete stale tests in test_crawl.py that referenced helpers removed
  by the pool-based screenshot crawl rewrite
  (_parse_discovery_links_only, _NON_HTML_URL_EXTENSIONS)
CI runs `ruff format --check src tests` in addition to `ruff check`;
8 files were flagged. Apply ruff format so the Lint job passes.
confirm_overwrite() called click.confirm() when the target file
existed, which reads from sys.stdin directly. In REPL mode
prompt_toolkit owns the TTY (full-screen / alt-buffer) and never
forwards keystrokes to stdin, so the prompt blocked forever and the
REPL appeared frozen.

When is_repl_mode() is true, raise a UsageError telling the user to
re-run with --overwrite instead of attempting to prompt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant