Skip to content

feat(ollama): add generateWithUsage() and export Usage/GenerateResult types#9

Closed
pyramation wants to merge 3 commits into
mainfrom
feat/generate-with-usage
Closed

feat(ollama): add generateWithUsage() and export Usage/GenerateResult types#9
pyramation wants to merge 3 commits into
mainfrom
feat/generate-with-usage

Conversation

@pyramation
Copy link
Copy Markdown
Contributor

Summary

Adds generateWithUsage() to OllamaClient that returns token usage metadata alongside content. The Ollama API returns prompt_eval_count and eval_count in responses, which the OllamaAdapter already captures internally — but the legacy generate() API discarded them and returned only the text string.

Changes:

  • New generateWithUsage(input, onChunk?) method returns GenerateResult { content, usage, model, stopReason }
  • generate() now delegates to generateWithUsage() internally — fully backward compatible
  • Exports Usage and GenerateResult interfaces for consumer type safety
  • Works for both streaming (usage from final done: true chunk) and non-streaming modes

Motivation: graphile-llm needs actual provider token counts for billing metering (check_billing_quota → LLM call → record_usage with real token counts). Previously had to estimate from text length (~4 chars/token).

Review & Testing Checklist for Human

  • Verify generate() still returns string (non-streaming) and void (streaming) — backward compat
  • Test generateWithUsage() with a local Ollama instance: confirm usage.input and usage.output are populated from the provider response
  • Check that streaming mode with onChunk callback still delivers text deltas AND returns usage in the GenerateResult

Notes

  • The OllamaAdapter.stream() was already capturing prompt_eval_count/eval_count from the done: true chunk (lines 464-466). This change just surfaces that data through the public API.
  • Usage was changed from interface to export interface.
  • GenerateResult.stopReason includes 'toolUse' to match the full AssistantMessage stopReason union.

Link to Devin session: https://app.devin.ai/sessions/2b5a29d83d3f478e8d3d972653b4879c
Requested by: @pyramation

… types

Adds generateWithUsage() method to OllamaClient that returns token usage
metadata (prompt_tokens, completion_tokens, total_tokens) alongside content.
The existing generate() method delegates to generateWithUsage() internally,
maintaining full backward compatibility.

Also exports the Usage and GenerateResult interfaces so consumers can
type their metering/billing integrations.
@devin-ai-integration
Copy link
Copy Markdown

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Three smoke tests covering:
- Batch mode: content + non-zero usage (input, output, totalTokens)
- Streaming mode: chunks received + usage returned after completion
- Multi-turn chat: token counts for conversation context
@pyramation pyramation closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant