Skip to content

idea: Support for Configurable Input and Output Token Limits for Custom Model Providers #7792

@kkailaasa

Description

@kkailaasa

Problem Statement

When I use custom or cloud model providers (like OpenAI-compatible APIs, Groq, or self-hosted ones), Jan doesn't let me set separate limits for:

How many tokens the prompt/input can use
How many tokens the output/response can generate (max_tokens)

Right now, the Context Size setting mostly works for local models (like llama.cpp), but not for remote ones.

This causes errors like "max_tokens too large" or unexpected behavior with cloud models that have different limits.

Feature Idea

In the model settings (gear icon) or provider page, add simple fields for custom/remote models:

Max Context Tokens (total input + output allowed)
Max Output Tokens (how long replies can be, like 4096 or 8192)
Auto Trim Input Tokens in conversation history to maintain Context Tokens

Even better if we could implement a "auto-compact" conversation like in Claude Code or Codex

This would fix errors, help control costs, and make Jan work better with cloud/remote models

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions