feat: add natural-language tool execution by AbdulR11 · Pull Request #3781 · IBM/mcp-context-forge

AbdulR11 · 2026-03-21T18:17:14Z

🔗 Related Issue

Closes #2248

📝 Summary

This PR adds a Natural Language Tool Execution API so users can run tools through conversational queries. The gateway classifies intent, matches tools, extracts parameters, confirms risky actions, executes tools, and formats responses back into natural language.

Problem Statement

Tool execution currently requires:

Knowing tool IDs ahead of time
Manual JSON parameter input
Schema memorization
Context switching away from natural conversation

Solution

NL execution flow: intent → tool matching → slot filling → confirmation → execution → response
Conversation context for follow-up clarification
Safety checks for destructive or sensitive operations
Follow-up suggestions for next actions

Key Components

Service Layer: NL orchestration + LLM prompts
Router Endpoints: /api/v1/nl/*
Configuration: feature flags + safety controls
Tests: service and router coverage

🏷️ Type of Change

Test Coverage

✅ NL execution service flow (no tool, clarification, confirmation, success)
✅ Router execution path and config guard

📓 Notes

Files Changed

Core Implementation

mcpgateway/services/nl_execution_service.py - NL execution orchestration
mcpgateway/routers/nl_router.py - NL endpoints
mcpgateway/config.py - NL settings and feature flags
mcpgateway/main.py - Router registration

Tests

tests/unit/mcpgateway/services/test_nl_execution_service.py - Service coverage
tests/unit/mcpgateway/routers/test_nl_router.py - Router coverage
tests/unit/mcpgateway/routers/__init__.py - Package marker for router tests

Usage Example

Execute with natural language:

POST /api/v1/nl/execute
{
  "query": "What's the weather in San Francisco?"
}

Confirm risky action:

POST /api/v1/nl/confirm
{
  "session_id": "<session>",
  "confirm": true
}

Benefits

For Users:

Execute tools conversationally without remembering schemas
Multi-turn clarification for missing details

For Operators:

Safety confirmations for high-risk tools
Configurable model and rate limits

Configuration

MCPGATEWAY_NL_EXECUTION_ENABLED=true
MCPGATEWAY_NL_EXECUTION_MODEL=gpt-4o

Copilot

Pull request overview

This PR introduces a Natural Language (NL) tool execution API in the MCP Gateway, enabling users to execute tools via conversational queries with intent classification, tool matching, slot filling, optional confirmation, and natural-language response formatting.

Changes:

Added NLExecutionService to orchestrate NL intent → match → slot-fill → confirm → execute → respond flow.
Added /api/v1/nl/* FastAPI router endpoints for execute/parse/confirm/context/feedback.
Added unit tests covering core service flow and router behavior, plus new config flags for NL execution.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
`mcpgateway/services/nl_execution_service.py`	Implements NL orchestration, context, safeguards, and tool invocation/formatting logic
`mcpgateway/routers/nl_router.py`	Adds NL endpoints and basic rate limiting/feature gating
`mcpgateway/config.py`	Adds NL execution settings (and also includes several unrelated security/default changes)
`tests/unit/mcpgateway/services/test_nl_execution_service.py`	Unit tests for NL execution service paths
`tests/unit/mcpgateway/routers/test_nl_router.py`	Unit tests for NL router gating and service invocation
`tests/unit/mcpgateway/routers/__init__.py`	Marks router tests as a package

Comments suppressed due to low confidence (1)

mcpgateway/config.py:1000

get_security_status() compares self.jwt_secret_key directly to a string. Since jwt_secret_key is a SecretStr, this comparison is unreliable and can report secure_secrets=True even when the default secret is configured. Extract the actual secret value (get_secret_value()) before comparing.

        return {
            "secure_secrets": self.jwt_secret_key != "my-test-key",  # nosec B105 - checking for default value
            "auth_enabled": self.auth_required,
            "ssl_verification": not self.skip_ssl_verify,
            "debug_disabled": not self.debug,
            "cors_restricted": "*" not in self.allowed_origins if self.cors_enabled else True,
            "ui_protected": not self.mcpgateway_ui_enabled or self.auth_required,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-21T18:23:12Z

+from mcpgateway.services.semantic_search_service import get_semantic_search_service
+from mcpgateway.services.tool_service import ToolService
+from mcpgateway.utils.redis_client import get_redis_client
+


NLExecutionService imports mcpgateway.services.semantic_search_service, but that module/file does not exist in the repo (no mcpgateway/services/semantic_search_service.py or package). This will raise ImportError at import time and break the API when the NL router is included. Either add the missing semantic search service module or change the import to the correct existing service implementation.

Suggested change

from mcpgateway.services.semantic_search_service import get_semantic_search_service

from mcpgateway.services.tool_service import ToolService

from mcpgateway.utils.redis_client import get_redis_client

from mcpgateway.services.tool_service import ToolService

from mcpgateway.utils.redis_client import get_redis_client

try:

# Optional semantic search integration; may not be present in all deployments.

from mcpgateway.services.semantic_search_service import get_semantic_search_service

except ImportError: # pragma: no cover - defensive fallback when module is absent

def get_semantic_search_service(*args: Any, **kwargs: Any) -> Any:

"""Fallback when semantic_search_service is not available.

This preserves importability of nl_execution_service even if the

semantic search module is not present in the environment, and fails

explicitly when semantic search is actually requested.

"""

raise RuntimeError(

"Semantic search service is not available: "

"missing 'mcpgateway.services.semantic_search_service'."

)

Copilot · 2026-03-21T18:23:12Z

+        return {
+            "is_error": bool(getattr(tool_result, "isError", False)),
+            "content": content_text,
+            "summary": content_text,
+        }


_tool_result_to_dict() checks tool_result.isError, but the ToolResult schema uses is_error. As written, errors will be silently treated as successes and the LLM formatting step won’t be told the result is an error. Read is_error (and ideally propagate error_message into the dict).

Suggested change

return {

"is_error": bool(getattr(tool_result, "isError", False)),

"content": content_text,

"summary": content_text,

}

is_error = bool(getattr(tool_result, "is_error", getattr(tool_result, "isError", False)))

error_message = getattr(tool_result, "error_message", None)

result: Dict[str, Any] = {

"is_error": is_error,

"content": content_text,

"summary": content_text,

}

if error_message is not None:

result["error_message"] = error_message

return result

Copilot · 2026-03-21T18:23:12Z

+            inferred_params=inferred,
+            validation_errors=errors,
+            confidence=confidence,
+            needs_clarification=bool(missing),


Slot filling validation errors are collected (errors) but never influence control flow: needs_clarification is based only on missing required params. This can proceed to tool execution with schema-invalid arguments. Consider treating non-empty validation_errors as either needs_clarification (ask user to correct) or an error response before invoking the tool.

Suggested change

needs_clarification=bool(missing),

needs_clarification=bool(missing or errors),

Copilot · 2026-03-21T18:23:13Z

+        result = await self._llm.generate_json(
+            db,
+            self._resolve_model(None),
+            prompt,
+            temperature=settings.nl_execution_temperature,
+            max_tokens=settings.nl_execution_max_tokens,


Model/temperature/max_tokens overrides from the request aren’t applied to tool matching: _match_tools() always uses self._resolve_model(None) and settings.nl_execution_temperature/max_tokens. If the API allows per-request overrides, thread the resolved model and generation params into _match_tools() so intent classification, matching, and slot filling use a consistent configuration.

Suggested change

result = await self._llm.generate_json(

db,

self._resolve_model(None),

prompt,

temperature=settings.nl_execution_temperature,

max_tokens=settings.nl_execution_max_tokens,

# Allow per-request overrides for model, temperature, and max_tokens if provided

local_vars = locals()

model_override = local_vars.get("model")

temperature_override = local_vars.get("temperature")

max_tokens_override = local_vars.get("max_tokens")

effective_model = self._resolve_model(model_override)

effective_temperature = (

settings.nl_execution_temperature

if temperature_override is None

else temperature_override

)

effective_max_tokens = (

settings.nl_execution_max_tokens

if max_tokens_override is None

else max_tokens_override

)

result = await self._llm.generate_json(

db,

effective_model,

prompt,

temperature=effective_temperature,

max_tokens=effective_max_tokens,

Copilot · 2026-03-21T18:23:13Z

+                select(DbTool).where(
+                    DbTool.enabled.is_(True),
+                    or_(DbTool.name.ilike(pattern), DbTool.description.ilike(pattern), DbTool.original_description.ilike(pattern)),
+                )
+            )
+            .scalars()
+            .all()
+        )
+        results = []
+        for tool in rows[:limit]:


_keyword_search() loads all matching tools (.all()) and only then slices to limit, which can cause unnecessary full scans and large result materialization. Apply a SQL LIMIT at the query level to bound work (and consider escaping %/_ in user input if needed).

Suggested change

select(DbTool).where(

DbTool.enabled.is_(True),

or_(DbTool.name.ilike(pattern), DbTool.description.ilike(pattern), DbTool.original_description.ilike(pattern)),

)

)

.scalars()

.all()

)

results = []

for tool in rows[:limit]:

select(DbTool)

.where(

DbTool.enabled.is_(True),

or_(

DbTool.name.ilike(pattern),

DbTool.description.ilike(pattern),

DbTool.original_description.ilike(pattern),

),

)

.limit(limit)

)

.scalars()

.all()

)

results = []

for tool in rows:

Copilot · 2026-03-21T18:23:14Z

    x_frame_options: Optional[str] = Field(default="DENY")

    @field_validator("x_frame_options")
    @classmethod
    def normalize_x_frame_options(cls, v: Optional[str]) -> Optional[str]:
-        """Convert string 'null', 'none', or empty/whitespace-only string to Python None to disable iframe restrictions.
+        """Convert string 'null' or 'none' to Python None to disable iframe restrictions.

        Args:
-            v: The X-Frame-Options value to normalize.
+            v: The x_frame_options value from environment/config

        Returns:
-            None if v is None, an empty/whitespace-only string, or case-insensitive 'null'/'none';
-            otherwise returns the stripped string value.
+            None if v is "null" or "none" (case-insensitive), otherwise returns v unchanged
        """
-        if v is None:
+        if isinstance(v, str) and v.lower() in ("null", "none"):
            return None
-        val = v.strip()
-        if val == "" or val.lower() in ("null", "none"):
-            return None
-        return val
+        return v


normalize_x_frame_options() no longer strips whitespace / treats empty strings as None. If an env var is set to an empty or whitespace-only value, it will now be returned unchanged and may produce an invalid X-Frame-Options header. Consider restoring the previous normalization for empty/whitespace-only strings.

Copilot · 2026-03-21T18:23:14Z

-        default_factory=list,
-        description="CSV/JSON list of header items to hide. Valid values: logout, team_selector, user_identity, theme_toggle",
-    )
+    plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")


plugins_can_override_rbac now defaults to True, allowing plugin auth hooks to override RBAC by default. This is a high-impact security posture change and is risky as a default (especially for deployments that enable plugins without realizing this). Default should be False (audit-only) with an explicit opt-in to override RBAC.

Suggested change

plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")

plugins_can_override_rbac: bool = Field(default=False, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")

Copilot · 2026-03-21T18:23:14Z

    # MCP Client Authentication
    mcp_client_auth_enabled: bool = Field(default=True, description="Enable JWT authentication for MCP client operations")
-    mcp_require_auth: Optional[bool] = Field(
-        default=None,
-        description=(
-            "Require authentication for /mcp endpoints. "
-            "When unset, inherits AUTH_REQUIRED. "
-            "Set false explicitly to allow unauthenticated access to public items only; "
-            "set true to require a valid Bearer token for all /mcp requests."
-        ),
+    mcp_require_auth: bool = Field(
+        default=False,
+        description="Require authentication for /mcp endpoints. If false, unauthenticated requests can access public items only. " "If true, all /mcp requests must include a valid Bearer token.",
    )


mcp_require_auth is now a non-optional bool defaulting to False, and the previous logic that derived it from auth_required was removed. This changes the default behavior to allow unauthenticated access to MCP endpoints (public-only) even when AUTH_REQUIRED=true. If that behavior change isn’t intentional and explicitly documented/migrated, consider restoring the old defaulting behavior or default mcp_require_auth to True when auth_required is enabled.

Copilot · 2026-03-21T18:23:15Z

+# First-Party
+from mcpgateway.config import settings
+from mcpgateway.services.nl_execution_service import (
+    IntentClassification,
+    NLExecutionService,
+    SlotFillingResult,
+    ToolCandidate,
+    ToolMatch,
+)


settings is imported but never used in this test file, which will fail linting in setups that enforce unused imports. Remove the unused import or use it in an assertion.

Copilot · 2026-03-21T18:23:15Z

+    llmchat_enabled: bool = Field(default=False, description="Enable LLM Chat feature")
    toolops_enabled: bool = Field(default=False, description="Enable ToolOps feature")
-    plugins_can_override_rbac: bool = Field(
-        default=False,
-        description=("Allow HTTP_AUTH_CHECK_PERMISSION plugins to short-circuit built-in RBAC grants. " "Disabled by default so plugin grant decisions are audit-only unless explicitly enabled."),
-    )
-    plugins_can_override_auth_headers: bool = Field(
+
+    # Natural language tool execution settings
+    nl_execution_enabled: bool = Field(default=False, description="Enable natural language tool execution endpoints")
+    nl_execution_model: str = Field(default="", description="LLM model ID for NL execution")
+    nl_execution_temperature: float = Field(default=0.3, ge=0.0, le=2.0, description="Default temperature for NL execution")
+    nl_execution_max_tokens: int = Field(default=1000, ge=1, description="Maximum tokens for NL execution prompts")
+    nl_execution_min_confidence: float = Field(default=0.6, ge=0.0, le=1.0, description="Minimum tool match confidence threshold")
+    nl_execution_max_tool_candidates: int = Field(default=5, ge=1, le=50, description="Maximum tool candidates for NL matching")
+    nl_execution_semantic_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0, description="Optional semantic similarity threshold")
+    nl_execution_allow_inference: bool = Field(default=True, description="Allow inferred parameters during slot filling")
+    nl_execution_max_clarification_rounds: int = Field(default=3, ge=1, description="Maximum clarification turns per session")
+    nl_execution_confirm_high_risk: bool = Field(default=True, description="Require confirmation for high-risk tools")
+    nl_execution_confirm_destructive: bool = Field(default=True, description="Require confirmation for destructive tools")
+    nl_execution_sensitive_param_patterns: List[str] = Field(
+        default_factory=lambda: ["password", "secret", "token", "production"],
+        description="Regex patterns for sensitive parameters",
+    )
+    nl_execution_followups_enabled: bool = Field(default=True, description="Include follow-up suggestions in NL responses")
+    nl_execution_max_followups: int = Field(default=3, ge=1, description="Maximum follow-up suggestions")
+    nl_execution_context_ttl: int = Field(default=3600, description="TTL in seconds for NL conversation context")
+    nl_execution_rate_limit: int = Field(default=30, ge=0, description="Requests per minute limit for NL endpoints")
+    nl_execution_max_context_messages: int = Field(default=20, ge=1, description="Maximum messages retained in NL context")
+


This PR is described as adding NL tool execution, but config.py also changes several unrelated defaults/flags (e.g., llmchat_enabled default, SSRF defaults, mcp_require_auth, plugin override behavior) and adds large new config sections. If these are intentional, they should be called out explicitly in the PR description (and potentially split into a separate PR) to reduce rollout risk and make review/approval clearer.

crivetimihai · 2026-03-29T13:17:21Z

Thanks @AbdulR11. Ambitious feature — however there are two blockers:

All commits are missing the DCO Signed-off-by line (required). Please add it via git rebase --signoff HEAD~N and force-push.
There are merge conflicts — please rebase onto main.

Signed-off-by: AbdulR11 <147985851+AbdulR11@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 21, 2026 18:17

Copilot started reviewing on behalf of AbdulR11 March 21, 2026 18:17 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

crivetimihai added enhancement New feature or request COULD P3: Nice-to-have features with minimal impact if left out; included if time permits labels Mar 29, 2026

crivetimihai added this to the Release 1.1.0 milestone Mar 29, 2026

feat: add natural-language tool execution

a25d509

Signed-off-by: AbdulR11 <147985851+AbdulR11@users.noreply.github.com>

AbdulR11 force-pushed the natural-language-tool branch from 1abc723 to a25d509 Compare March 31, 2026 15:16

AbdulR11 requested review from crivetimihai, kevalmahajan and madhav165 as code owners March 31, 2026 15:16

Merge branch 'main' into natural-language-tool

5c4569d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add natural-language tool execution#3781

feat: add natural-language tool execution#3781
AbdulR11 wants to merge 2 commits intoIBM:mainfrom
AbdulR11:natural-language-tool

AbdulR11 commented Mar 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

crivetimihai commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-from mcpgateway.services.semantic_search_service import get_semantic_search_service
-from mcpgateway.services.tool_service import ToolService
-from mcpgateway.utils.redis_client import get_redis_client
+from mcpgateway.services.tool_service import ToolService
+from mcpgateway.utils.redis_client import get_redis_client
+try:
+    # Optional semantic search integration; may not be present in all deployments.
+    from mcpgateway.services.semantic_search_service import get_semantic_search_service
+except ImportError:  # pragma: no cover - defensive fallback when module is absent
+    def get_semantic_search_service(*args: Any, **kwargs: Any) -> Any:
+        """Fallback when semantic_search_service is not available.
+        This preserves importability of nl_execution_service even if the
+        semantic search module is not present in the environment, and fails
+        explicitly when semantic search is actually requested.
+        """
+        raise RuntimeError(
+            "Semantic search service is not available: "
+            "missing 'mcpgateway.services.semantic_search_service'."
+        )

-        return {
-            "is_error": bool(getattr(tool_result, "isError", False)),
-            "content": content_text,
-            "summary": content_text,
-        }
+        is_error = bool(getattr(tool_result, "is_error", getattr(tool_result, "isError", False)))
+        error_message = getattr(tool_result, "error_message", None)
+        result: Dict[str, Any] = {
+            "is_error": is_error,
+            "content": content_text,
+            "summary": content_text,
+        }
+        if error_message is not None:
+            result["error_message"] = error_message
+        return result

	needs_clarification=bool(missing),
	needs_clarification=bool(missing or errors),

-        result = await self._llm.generate_json(
-            db,
-            self._resolve_model(None),
-            prompt,
-            temperature=settings.nl_execution_temperature,
-            max_tokens=settings.nl_execution_max_tokens,
+        # Allow per-request overrides for model, temperature, and max_tokens if provided
+        local_vars = locals()
+        model_override = local_vars.get("model")
+        temperature_override = local_vars.get("temperature")
+        max_tokens_override = local_vars.get("max_tokens")
+        effective_model = self._resolve_model(model_override)
+        effective_temperature = (
+            settings.nl_execution_temperature
+            if temperature_override is None
+            else temperature_override
+        )
+        effective_max_tokens = (
+            settings.nl_execution_max_tokens
+            if max_tokens_override is None
+            else max_tokens_override
+        )
+        result = await self._llm.generate_json(
+            db,
+            effective_model,
+            prompt,
+            temperature=effective_temperature,
+            max_tokens=effective_max_tokens,

	plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")
	plugins_can_override_rbac: bool = Field(default=False, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks")

Conversation

AbdulR11 commented Mar 21, 2026

🔗 Related Issue

Closes #2248

📝 Summary

Problem Statement

Solution

Key Components

🏷️ Type of Change

Test Coverage

📓 Notes

Files Changed

Core Implementation

Tests

Usage Example

Benefits

Configuration

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

crivetimihai commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants