idea: Support for Configurable Input and Output Token Limits for Custom Model Providers

## Problem Statement

When I use custom or cloud model providers (like OpenAI-compatible APIs, Groq, or self-hosted ones), Jan doesn't let me set separate limits for:

How many tokens the prompt/input can use
How many tokens the output/response can generate (max_tokens)

Right now, the Context Size setting mostly works for local models (like llama.cpp), but not for remote ones.

This causes errors like "max_tokens too large" or unexpected behavior with cloud models that have different limits.

## Feature Idea
In the model settings (gear icon) or provider page, add simple fields for custom/remote models:

Max Context Tokens (total input + output allowed)
Max Output Tokens (how long replies can be, like 4096 or 8192)
Auto Trim Input Tokens in conversation history to maintain Context Tokens

Even better if we could implement a "auto-compact" conversation like in Claude Code or Codex

This would fix errors, help control costs, and make Jan work better with cloud/remote models


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idea: Support for Configurable Input and Output Token Limits for Custom Model Providers #7792

Problem Statement

Feature Idea

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

idea: Support for Configurable Input and Output Token Limits for Custom Model Providers #7792

Description

Problem Statement

Feature Idea

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions