[FEATURE] Add prompt caching support for providers(Currently for Anthropic and Bedrock)

### Scope check

- [x] This is **core LLM communication** (not application logic)
- [x] This **benefits most users** (not just my use case)
- [x] This **can't be solved in application code** with current RubyLLM
- [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)

### Due diligence

- [x] I searched existing issues
- [x] I checked the documentation

### What problem does this solve?

When building applications that repeatedly send large static context (system prompts, RAG documents, tool definitions) alongside dynamic user queries, every request re-sends the full token count. Both Anthropic and Gemini offer provider level caching mechanisms to avoid this, but ruby_llm has no way to express cache points today.

### Proposed solution

Add a `cache_point: true` keyword to `Chat#with_instructions` and `Chat#ask` that marks the static portion of a prompt as cacheable. The gem handles the provider specific implementation transparently:

Anthropic — injects `cache_control: { type: 'ephemeral' }` on the last content block of cache-pointed messages (up to 4 breakpoints per request)
Gemini — uploads static messages to the Context Caching API on first call, stores the cachedContent name on the chat object, and references it in subsequent generateContent requests instead of re-sending inline.

Anthropic input tokens usage.

First call - 

<img width="2336" height="916" alt="Image" src="https://github.com/user-attachments/assets/c1cb2edb-970f-4719-b4d6-87062e112b97" />

Next call -

<img width="2548" height="978" alt="Image" src="https://github.com/user-attachments/assets/df0ef365-1a25-47fd-8378-27c639cd5ae2" />

### Why this belongs in RubyLLM

Prompt caching cannot be implemented in application code using existing RubyLLM primitives. Here's why:

For example Anthropic requires `cache_control: { type: 'ephemeral' }` to be injected into specific content blocks inside the formatted message payload. The payload structure is entirely internal to `Anthropic::Chat#render_payload` — application code has no way to reach inside and modify individual content blocks after formatting. It also requires the `anthropic-beta: prompt-caching-2024-07-31` request header, which can't be conditionally added based on message content from outside the gem.

The user facing API is a single keyword: `cache_point: true` on `with_instructions` or `ask`. That's it. The complexity is entirely hidden inside the provider layer where it belongs, consistent with how RubyLLM already abstracts streaming, tool calls, thinking tokens, and structured output across providers. Application code stays clean and provider agnostic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Add prompt caching support for providers(Currently for Anthropic and Bedrock) #706

Scope check

Due diligence

What problem does this solve?

Proposed solution

Why this belongs in RubyLLM

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Add prompt caching support for providers(Currently for Anthropic and Bedrock) #706

Description

Scope check

Due diligence

What problem does this solve?

Proposed solution

Why this belongs in RubyLLM

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions