Problem Statement
When I use custom or cloud model providers (like OpenAI-compatible APIs, Groq, or self-hosted ones), Jan doesn't let me set separate limits for:
How many tokens the prompt/input can use
How many tokens the output/response can generate (max_tokens)
Right now, the Context Size setting mostly works for local models (like llama.cpp), but not for remote ones.
This causes errors like "max_tokens too large" or unexpected behavior with cloud models that have different limits.
Feature Idea
In the model settings (gear icon) or provider page, add simple fields for custom/remote models:
Max Context Tokens (total input + output allowed)
Max Output Tokens (how long replies can be, like 4096 or 8192)
Auto Trim Input Tokens in conversation history to maintain Context Tokens
Even better if we could implement a "auto-compact" conversation like in Claude Code or Codex
This would fix errors, help control costs, and make Jan work better with cloud/remote models
Problem Statement
When I use custom or cloud model providers (like OpenAI-compatible APIs, Groq, or self-hosted ones), Jan doesn't let me set separate limits for:
How many tokens the prompt/input can use
How many tokens the output/response can generate (max_tokens)
Right now, the Context Size setting mostly works for local models (like llama.cpp), but not for remote ones.
This causes errors like "max_tokens too large" or unexpected behavior with cloud models that have different limits.
Feature Idea
In the model settings (gear icon) or provider page, add simple fields for custom/remote models:
Max Context Tokens (total input + output allowed)
Max Output Tokens (how long replies can be, like 4096 or 8192)
Auto Trim Input Tokens in conversation history to maintain Context Tokens
Even better if we could implement a "auto-compact" conversation like in Claude Code or Codex
This would fix errors, help control costs, and make Jan work better with cloud/remote models