Skip to content

feat: add natural-language tool execution#3781

Open
AbdulR11 wants to merge 2 commits intoIBM:mainfrom
AbdulR11:natural-language-tool
Open

feat: add natural-language tool execution#3781
AbdulR11 wants to merge 2 commits intoIBM:mainfrom
AbdulR11:natural-language-tool

Conversation

@AbdulR11
Copy link
Copy Markdown

🔗 Related Issue

Closes #2248

📝 Summary

This PR adds a Natural Language Tool Execution API so users can run tools through conversational queries. The gateway classifies intent, matches tools, extracts parameters, confirms risky actions, executes tools, and formats responses back into natural language.

Problem Statement

Tool execution currently requires:

  • Knowing tool IDs ahead of time
  • Manual JSON parameter input
  • Schema memorization
  • Context switching away from natural conversation

Solution

  • NL execution flow: intent → tool matching → slot filling → confirmation → execution → response
  • Conversation context for follow-up clarification
  • Safety checks for destructive or sensitive operations
  • Follow-up suggestions for next actions

Key Components

  • Service Layer: NL orchestration + LLM prompts
  • Router Endpoints: /api/v1/nl/*
  • Configuration: feature flags + safety controls
  • Tests: service and router coverage

🏷️ Type of Change

  • Bug fix
  • Feature / Enhancement
  • Documentation
  • Refactor
  • Chore (deps, CI, tooling)
  • Other (describe below)

Test Coverage

  • ✅ NL execution service flow (no tool, clarification, confirmation, success)
  • ✅ Router execution path and config guard

📓 Notes

Files Changed

Core Implementation

  • mcpgateway/services/nl_execution_service.py - NL execution orchestration
  • mcpgateway/routers/nl_router.py - NL endpoints
  • mcpgateway/config.py - NL settings and feature flags
  • mcpgateway/main.py - Router registration

Tests

  • tests/unit/mcpgateway/services/test_nl_execution_service.py - Service coverage
  • tests/unit/mcpgateway/routers/test_nl_router.py - Router coverage
  • tests/unit/mcpgateway/routers/__init__.py - Package marker for router tests

Usage Example

Execute with natural language:

POST /api/v1/nl/execute
{
  "query": "What's the weather in San Francisco?"
}

Confirm risky action:

POST /api/v1/nl/confirm
{
  "session_id": "<session>",
  "confirm": true
}

Benefits

For Users:

  • Execute tools conversationally without remembering schemas
  • Multi-turn clarification for missing details

For Operators:

  • Safety confirmations for high-risk tools
  • Configurable model and rate limits

Configuration

MCPGATEWAY_NL_EXECUTION_ENABLED=true
MCPGATEWAY_NL_EXECUTION_MODEL=gpt-4o

Copilot AI review requested due to automatic review settings March 21, 2026 18:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a Natural Language (NL) tool execution API in the MCP Gateway, enabling users to execute tools via conversational queries with intent classification, tool matching, slot filling, optional confirmation, and natural-language response formatting.

Changes:

  • Added NLExecutionService to orchestrate NL intent → match → slot-fill → confirm → execute → respond flow.
  • Added /api/v1/nl/* FastAPI router endpoints for execute/parse/confirm/context/feedback.
  • Added unit tests covering core service flow and router behavior, plus new config flags for NL execution.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
mcpgateway/services/nl_execution_service.py Implements NL orchestration, context, safeguards, and tool invocation/formatting logic
mcpgateway/routers/nl_router.py Adds NL endpoints and basic rate limiting/feature gating
mcpgateway/config.py Adds NL execution settings (and also includes several unrelated security/default changes)
tests/unit/mcpgateway/services/test_nl_execution_service.py Unit tests for NL execution service paths
tests/unit/mcpgateway/routers/test_nl_router.py Unit tests for NL router gating and service invocation
tests/unit/mcpgateway/routers/__init__.py Marks router tests as a package
Comments suppressed due to low confidence (1)

mcpgateway/config.py:1000

  • get_security_status() compares self.jwt_secret_key directly to a string. Since jwt_secret_key is a SecretStr, this comparison is unreliable and can report secure_secrets=True even when the default secret is configured. Extract the actual secret value (get_secret_value()) before comparing.
        return {
            "secure_secrets": self.jwt_secret_key != "my-test-key",  # nosec B105 - checking for default value
            "auth_enabled": self.auth_required,
            "ssl_verification": not self.skip_ssl_verify,
            "debug_disabled": not self.debug,
            "cors_restricted": "*" not in self.allowed_origins if self.cors_enabled else True,
            "ui_protected": not self.mcpgateway_ui_enabled or self.auth_required,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +28 to +31
from mcpgateway.services.semantic_search_service import get_semantic_search_service
from mcpgateway.services.tool_service import ToolService
from mcpgateway.utils.redis_client import get_redis_client

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLExecutionService imports mcpgateway.services.semantic_search_service, but that module/file does not exist in the repo (no mcpgateway/services/semantic_search_service.py or package). This will raise ImportError at import time and break the API when the NL router is included. Either add the missing semantic search service module or change the import to the correct existing service implementation.

Suggested change
from mcpgateway.services.semantic_search_service import get_semantic_search_service
from mcpgateway.services.tool_service import ToolService
from mcpgateway.utils.redis_client import get_redis_client
from mcpgateway.services.tool_service import ToolService
from mcpgateway.utils.redis_client import get_redis_client
try:
# Optional semantic search integration; may not be present in all deployments.
from mcpgateway.services.semantic_search_service import get_semantic_search_service
except ImportError: # pragma: no cover - defensive fallback when module is absent
def get_semantic_search_service(*args: Any, **kwargs: Any) -> Any:
"""Fallback when semantic_search_service is not available.
This preserves importability of nl_execution_service even if the
semantic search module is not present in the environment, and fails
explicitly when semantic search is actually requested.
"""
raise RuntimeError(
"Semantic search service is not available: "
"missing 'mcpgateway.services.semantic_search_service'."
)

Copilot uses AI. Check for mistakes.
Comment on lines +968 to +972
return {
"is_error": bool(getattr(tool_result, "isError", False)),
"content": content_text,
"summary": content_text,
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_tool_result_to_dict() checks tool_result.isError, but the ToolResult schema uses is_error. As written, errors will be silently treated as successes and the LLM formatting step won’t be told the result is an error. Read is_error (and ideally propagate error_message into the dict).

Suggested change
return {
"is_error": bool(getattr(tool_result, "isError", False)),
"content": content_text,
"summary": content_text,
}
is_error = bool(getattr(tool_result, "is_error", getattr(tool_result, "isError", False)))
error_message = getattr(tool_result, "error_message", None)
result: Dict[str, Any] = {
"is_error": is_error,
"content": content_text,
"summary": content_text,
}
if error_message is not None:
result["error_message"] = error_message
return result

Copilot uses AI. Check for mistakes.
inferred_params=inferred,
validation_errors=errors,
confidence=confidence,
needs_clarification=bool(missing),
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slot filling validation errors are collected (errors) but never influence control flow: needs_clarification is based only on missing required params. This can proceed to tool execution with schema-invalid arguments. Consider treating non-empty validation_errors as either needs_clarification (ask user to correct) or an error response before invoking the tool.

Suggested change
needs_clarification=bool(missing),
needs_clarification=bool(missing or errors),

Copilot uses AI. Check for mistakes.
Comment on lines +549 to +554
result = await self._llm.generate_json(
db,
self._resolve_model(None),
prompt,
temperature=settings.nl_execution_temperature,
max_tokens=settings.nl_execution_max_tokens,
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model/temperature/max_tokens overrides from the request aren’t applied to tool matching: _match_tools() always uses self._resolve_model(None) and settings.nl_execution_temperature/max_tokens. If the API allows per-request overrides, thread the resolved model and generation params into _match_tools() so intent classification, matching, and slot filling use a consistent configuration.

Suggested change
result = await self._llm.generate_json(
db,
self._resolve_model(None),
prompt,
temperature=settings.nl_execution_temperature,
max_tokens=settings.nl_execution_max_tokens,
# Allow per-request overrides for model, temperature, and max_tokens if provided
local_vars = locals()
model_override = local_vars.get("model")
temperature_override = local_vars.get("temperature")
max_tokens_override = local_vars.get("max_tokens")
effective_model = self._resolve_model(model_override)
effective_temperature = (
settings.nl_execution_temperature
if temperature_override is None
else temperature_override
)
effective_max_tokens = (
settings.nl_execution_max_tokens
if max_tokens_override is None
else max_tokens_override
)
result = await self._llm.generate_json(
db,
effective_model,
prompt,
temperature=effective_temperature,
max_tokens=effective_max_tokens,

Copilot uses AI. Check for mistakes.
Comment on lines +844 to +853
select(DbTool).where(
DbTool.enabled.is_(True),
or_(DbTool.name.ilike(pattern), DbTool.description.ilike(pattern), DbTool.original_description.ilike(pattern)),
)
)
.scalars()
.all()
)
results = []
for tool in rows[:limit]:
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_keyword_search() loads all matching tools (.all()) and only then slices to limit, which can cause unnecessary full scans and large result materialization. Apply a SQL LIMIT at the query level to bound work (and consider escaping %/_ in user input if needed).

Suggested change
select(DbTool).where(
DbTool.enabled.is_(True),
or_(DbTool.name.ilike(pattern), DbTool.description.ilike(pattern), DbTool.original_description.ilike(pattern)),
)
)
.scalars()
.all()
)
results = []
for tool in rows[:limit]:
select(DbTool)
.where(
DbTool.enabled.is_(True),
or_(
DbTool.name.ilike(pattern),
DbTool.description.ilike(pattern),
DbTool.original_description.ilike(pattern),
),
)
.limit(limit)
)
.scalars()
.all()
)
results = []
for tool in rows:

Copilot uses AI. Check for mistakes.
Comment thread mcpgateway/config.py
Comment on lines 606 to +621
x_frame_options: Optional[str] = Field(default="DENY")

@field_validator("x_frame_options")
@classmethod
def normalize_x_frame_options(cls, v: Optional[str]) -> Optional[str]:
"""Convert string 'null', 'none', or empty/whitespace-only string to Python None to disable iframe restrictions.
"""Convert string 'null' or 'none' to Python None to disable iframe restrictions.

Args:
v: The X-Frame-Options value to normalize.
v: The x_frame_options value from environment/config

Returns:
None if v is None, an empty/whitespace-only string, or case-insensitive 'null'/'none';
otherwise returns the stripped string value.
None if v is "null" or "none" (case-insensitive), otherwise returns v unchanged
"""
if v is None:
if isinstance(v, str) and v.lower() in ("null", "none"):
return None
val = v.strip()
if val == "" or val.lower() in ("null", "none"):
return None
return val
return v
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_x_frame_options() no longer strips whitespace / treats empty strings as None. If an env var is set to an empty or whitespace-only value, it will now be returned unchanged and may produce an invalid X-Frame-Options header. Consider restoring the previous normalization for empty/whitespace-only strings.

Copilot uses AI. Check for mistakes.
Comment thread mcpgateway/config.py
default_factory=list,
description="CSV/JSON list of header items to hide. Valid values: logout, team_selector, user_identity, theme_toggle",
)
plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins_can_override_rbac now defaults to True, allowing plugin auth hooks to override RBAC by default. This is a high-impact security posture change and is risky as a default (especially for deployments that enable plugins without realizing this). Default should be False (audit-only) with an explicit opt-in to override RBAC.

Suggested change
plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")
plugins_can_override_rbac: bool = Field(default=False, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")

Copilot uses AI. Check for mistakes.
Comment thread mcpgateway/config.py
Comment on lines 318 to 323
# MCP Client Authentication
mcp_client_auth_enabled: bool = Field(default=True, description="Enable JWT authentication for MCP client operations")
mcp_require_auth: Optional[bool] = Field(
default=None,
description=(
"Require authentication for /mcp endpoints. "
"When unset, inherits AUTH_REQUIRED. "
"Set false explicitly to allow unauthenticated access to public items only; "
"set true to require a valid Bearer token for all /mcp requests."
),
mcp_require_auth: bool = Field(
default=False,
description="Require authentication for /mcp endpoints. If false, unauthenticated requests can access public items only. " "If true, all /mcp requests must include a valid Bearer token.",
)
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mcp_require_auth is now a non-optional bool defaulting to False, and the previous logic that derived it from auth_required was removed. This changes the default behavior to allow unauthenticated access to MCP endpoints (public-only) even when AUTH_REQUIRED=true. If that behavior change isn’t intentional and explicitly documented/migrated, consider restoring the old defaulting behavior or default mcp_require_auth to True when auth_required is enabled.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +18
# First-Party
from mcpgateway.config import settings
from mcpgateway.services.nl_execution_service import (
IntentClassification,
NLExecutionService,
SlotFillingResult,
ToolCandidate,
ToolMatch,
)
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings is imported but never used in this test file, which will fail linting in setups that enforce unused imports. Remove the unused import or use it in an assertion.

Copilot uses AI. Check for mistakes.
Comment thread mcpgateway/config.py
Comment on lines +650 to +674
llmchat_enabled: bool = Field(default=False, description="Enable LLM Chat feature")
toolops_enabled: bool = Field(default=False, description="Enable ToolOps feature")
plugins_can_override_rbac: bool = Field(
default=False,
description=("Allow HTTP_AUTH_CHECK_PERMISSION plugins to short-circuit built-in RBAC grants. " "Disabled by default so plugin grant decisions are audit-only unless explicitly enabled."),
)
plugins_can_override_auth_headers: bool = Field(

# Natural language tool execution settings
nl_execution_enabled: bool = Field(default=False, description="Enable natural language tool execution endpoints")
nl_execution_model: str = Field(default="", description="LLM model ID for NL execution")
nl_execution_temperature: float = Field(default=0.3, ge=0.0, le=2.0, description="Default temperature for NL execution")
nl_execution_max_tokens: int = Field(default=1000, ge=1, description="Maximum tokens for NL execution prompts")
nl_execution_min_confidence: float = Field(default=0.6, ge=0.0, le=1.0, description="Minimum tool match confidence threshold")
nl_execution_max_tool_candidates: int = Field(default=5, ge=1, le=50, description="Maximum tool candidates for NL matching")
nl_execution_semantic_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0, description="Optional semantic similarity threshold")
nl_execution_allow_inference: bool = Field(default=True, description="Allow inferred parameters during slot filling")
nl_execution_max_clarification_rounds: int = Field(default=3, ge=1, description="Maximum clarification turns per session")
nl_execution_confirm_high_risk: bool = Field(default=True, description="Require confirmation for high-risk tools")
nl_execution_confirm_destructive: bool = Field(default=True, description="Require confirmation for destructive tools")
nl_execution_sensitive_param_patterns: List[str] = Field(
default_factory=lambda: ["password", "secret", "token", "production"],
description="Regex patterns for sensitive parameters",
)
nl_execution_followups_enabled: bool = Field(default=True, description="Include follow-up suggestions in NL responses")
nl_execution_max_followups: int = Field(default=3, ge=1, description="Maximum follow-up suggestions")
nl_execution_context_ttl: int = Field(default=3600, description="TTL in seconds for NL conversation context")
nl_execution_rate_limit: int = Field(default=30, ge=0, description="Requests per minute limit for NL endpoints")
nl_execution_max_context_messages: int = Field(default=20, ge=1, description="Maximum messages retained in NL context")

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is described as adding NL tool execution, but config.py also changes several unrelated defaults/flags (e.g., llmchat_enabled default, SSRF defaults, mcp_require_auth, plugin override behavior) and adds large new config sections. If these are intentional, they should be called out explicitly in the PR description (and potentially split into a separate PR) to reduce rollout risk and make review/approval clearer.

Copilot uses AI. Check for mistakes.
@crivetimihai crivetimihai added enhancement New feature or request COULD P3: Nice-to-have features with minimal impact if left out; included if time permits labels Mar 29, 2026
@crivetimihai crivetimihai added this to the Release 1.1.0 milestone Mar 29, 2026
@crivetimihai
Copy link
Copy Markdown
Member

Thanks @AbdulR11. Ambitious feature — however there are two blockers:

  1. All commits are missing the DCO Signed-off-by line (required). Please add it via git rebase --signoff HEAD~N and force-push.
  2. There are merge conflicts — please rebase onto main.

Signed-off-by: AbdulR11 <147985851+AbdulR11@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

COULD P3: Nice-to-have features with minimal impact if left out; included if time permits enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE][AI]: Natural language direct tool execution

4 participants