feat: add natural-language tool execution#3781
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a Natural Language (NL) tool execution API in the MCP Gateway, enabling users to execute tools via conversational queries with intent classification, tool matching, slot filling, optional confirmation, and natural-language response formatting.
Changes:
- Added
NLExecutionServiceto orchestrate NL intent → match → slot-fill → confirm → execute → respond flow. - Added
/api/v1/nl/*FastAPI router endpoints for execute/parse/confirm/context/feedback. - Added unit tests covering core service flow and router behavior, plus new config flags for NL execution.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
mcpgateway/services/nl_execution_service.py |
Implements NL orchestration, context, safeguards, and tool invocation/formatting logic |
mcpgateway/routers/nl_router.py |
Adds NL endpoints and basic rate limiting/feature gating |
mcpgateway/config.py |
Adds NL execution settings (and also includes several unrelated security/default changes) |
tests/unit/mcpgateway/services/test_nl_execution_service.py |
Unit tests for NL execution service paths |
tests/unit/mcpgateway/routers/test_nl_router.py |
Unit tests for NL router gating and service invocation |
tests/unit/mcpgateway/routers/__init__.py |
Marks router tests as a package |
Comments suppressed due to low confidence (1)
mcpgateway/config.py:1000
get_security_status()comparesself.jwt_secret_keydirectly to a string. Sincejwt_secret_keyis aSecretStr, this comparison is unreliable and can reportsecure_secrets=Trueeven when the default secret is configured. Extract the actual secret value (get_secret_value()) before comparing.
return {
"secure_secrets": self.jwt_secret_key != "my-test-key", # nosec B105 - checking for default value
"auth_enabled": self.auth_required,
"ssl_verification": not self.skip_ssl_verify,
"debug_disabled": not self.debug,
"cors_restricted": "*" not in self.allowed_origins if self.cors_enabled else True,
"ui_protected": not self.mcpgateway_ui_enabled or self.auth_required,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from mcpgateway.services.semantic_search_service import get_semantic_search_service | ||
| from mcpgateway.services.tool_service import ToolService | ||
| from mcpgateway.utils.redis_client import get_redis_client | ||
|
|
There was a problem hiding this comment.
NLExecutionService imports mcpgateway.services.semantic_search_service, but that module/file does not exist in the repo (no mcpgateway/services/semantic_search_service.py or package). This will raise ImportError at import time and break the API when the NL router is included. Either add the missing semantic search service module or change the import to the correct existing service implementation.
| from mcpgateway.services.semantic_search_service import get_semantic_search_service | |
| from mcpgateway.services.tool_service import ToolService | |
| from mcpgateway.utils.redis_client import get_redis_client | |
| from mcpgateway.services.tool_service import ToolService | |
| from mcpgateway.utils.redis_client import get_redis_client | |
| try: | |
| # Optional semantic search integration; may not be present in all deployments. | |
| from mcpgateway.services.semantic_search_service import get_semantic_search_service | |
| except ImportError: # pragma: no cover - defensive fallback when module is absent | |
| def get_semantic_search_service(*args: Any, **kwargs: Any) -> Any: | |
| """Fallback when semantic_search_service is not available. | |
| This preserves importability of nl_execution_service even if the | |
| semantic search module is not present in the environment, and fails | |
| explicitly when semantic search is actually requested. | |
| """ | |
| raise RuntimeError( | |
| "Semantic search service is not available: " | |
| "missing 'mcpgateway.services.semantic_search_service'." | |
| ) |
| return { | ||
| "is_error": bool(getattr(tool_result, "isError", False)), | ||
| "content": content_text, | ||
| "summary": content_text, | ||
| } |
There was a problem hiding this comment.
_tool_result_to_dict() checks tool_result.isError, but the ToolResult schema uses is_error. As written, errors will be silently treated as successes and the LLM formatting step won’t be told the result is an error. Read is_error (and ideally propagate error_message into the dict).
| return { | |
| "is_error": bool(getattr(tool_result, "isError", False)), | |
| "content": content_text, | |
| "summary": content_text, | |
| } | |
| is_error = bool(getattr(tool_result, "is_error", getattr(tool_result, "isError", False))) | |
| error_message = getattr(tool_result, "error_message", None) | |
| result: Dict[str, Any] = { | |
| "is_error": is_error, | |
| "content": content_text, | |
| "summary": content_text, | |
| } | |
| if error_message is not None: | |
| result["error_message"] = error_message | |
| return result |
| inferred_params=inferred, | ||
| validation_errors=errors, | ||
| confidence=confidence, | ||
| needs_clarification=bool(missing), |
There was a problem hiding this comment.
Slot filling validation errors are collected (errors) but never influence control flow: needs_clarification is based only on missing required params. This can proceed to tool execution with schema-invalid arguments. Consider treating non-empty validation_errors as either needs_clarification (ask user to correct) or an error response before invoking the tool.
| needs_clarification=bool(missing), | |
| needs_clarification=bool(missing or errors), |
| result = await self._llm.generate_json( | ||
| db, | ||
| self._resolve_model(None), | ||
| prompt, | ||
| temperature=settings.nl_execution_temperature, | ||
| max_tokens=settings.nl_execution_max_tokens, |
There was a problem hiding this comment.
Model/temperature/max_tokens overrides from the request aren’t applied to tool matching: _match_tools() always uses self._resolve_model(None) and settings.nl_execution_temperature/max_tokens. If the API allows per-request overrides, thread the resolved model and generation params into _match_tools() so intent classification, matching, and slot filling use a consistent configuration.
| result = await self._llm.generate_json( | |
| db, | |
| self._resolve_model(None), | |
| prompt, | |
| temperature=settings.nl_execution_temperature, | |
| max_tokens=settings.nl_execution_max_tokens, | |
| # Allow per-request overrides for model, temperature, and max_tokens if provided | |
| local_vars = locals() | |
| model_override = local_vars.get("model") | |
| temperature_override = local_vars.get("temperature") | |
| max_tokens_override = local_vars.get("max_tokens") | |
| effective_model = self._resolve_model(model_override) | |
| effective_temperature = ( | |
| settings.nl_execution_temperature | |
| if temperature_override is None | |
| else temperature_override | |
| ) | |
| effective_max_tokens = ( | |
| settings.nl_execution_max_tokens | |
| if max_tokens_override is None | |
| else max_tokens_override | |
| ) | |
| result = await self._llm.generate_json( | |
| db, | |
| effective_model, | |
| prompt, | |
| temperature=effective_temperature, | |
| max_tokens=effective_max_tokens, |
| select(DbTool).where( | ||
| DbTool.enabled.is_(True), | ||
| or_(DbTool.name.ilike(pattern), DbTool.description.ilike(pattern), DbTool.original_description.ilike(pattern)), | ||
| ) | ||
| ) | ||
| .scalars() | ||
| .all() | ||
| ) | ||
| results = [] | ||
| for tool in rows[:limit]: |
There was a problem hiding this comment.
_keyword_search() loads all matching tools (.all()) and only then slices to limit, which can cause unnecessary full scans and large result materialization. Apply a SQL LIMIT at the query level to bound work (and consider escaping %/_ in user input if needed).
| select(DbTool).where( | |
| DbTool.enabled.is_(True), | |
| or_(DbTool.name.ilike(pattern), DbTool.description.ilike(pattern), DbTool.original_description.ilike(pattern)), | |
| ) | |
| ) | |
| .scalars() | |
| .all() | |
| ) | |
| results = [] | |
| for tool in rows[:limit]: | |
| select(DbTool) | |
| .where( | |
| DbTool.enabled.is_(True), | |
| or_( | |
| DbTool.name.ilike(pattern), | |
| DbTool.description.ilike(pattern), | |
| DbTool.original_description.ilike(pattern), | |
| ), | |
| ) | |
| .limit(limit) | |
| ) | |
| .scalars() | |
| .all() | |
| ) | |
| results = [] | |
| for tool in rows: |
| x_frame_options: Optional[str] = Field(default="DENY") | ||
|
|
||
| @field_validator("x_frame_options") | ||
| @classmethod | ||
| def normalize_x_frame_options(cls, v: Optional[str]) -> Optional[str]: | ||
| """Convert string 'null', 'none', or empty/whitespace-only string to Python None to disable iframe restrictions. | ||
| """Convert string 'null' or 'none' to Python None to disable iframe restrictions. | ||
|
|
||
| Args: | ||
| v: The X-Frame-Options value to normalize. | ||
| v: The x_frame_options value from environment/config | ||
|
|
||
| Returns: | ||
| None if v is None, an empty/whitespace-only string, or case-insensitive 'null'/'none'; | ||
| otherwise returns the stripped string value. | ||
| None if v is "null" or "none" (case-insensitive), otherwise returns v unchanged | ||
| """ | ||
| if v is None: | ||
| if isinstance(v, str) and v.lower() in ("null", "none"): | ||
| return None | ||
| val = v.strip() | ||
| if val == "" or val.lower() in ("null", "none"): | ||
| return None | ||
| return val | ||
| return v |
There was a problem hiding this comment.
normalize_x_frame_options() no longer strips whitespace / treats empty strings as None. If an env var is set to an empty or whitespace-only value, it will now be returned unchanged and may produce an invalid X-Frame-Options header. Consider restoring the previous normalization for empty/whitespace-only strings.
| default_factory=list, | ||
| description="CSV/JSON list of header items to hide. Valid values: logout, team_selector, user_identity, theme_toggle", | ||
| ) | ||
| plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks") |
There was a problem hiding this comment.
plugins_can_override_rbac now defaults to True, allowing plugin auth hooks to override RBAC by default. This is a high-impact security posture change and is risky as a default (especially for deployments that enable plugins without realizing this). Default should be False (audit-only) with an explicit opt-in to override RBAC.
| plugins_can_override_rbac: bool = Field(default=True, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks") | |
| plugins_can_override_rbac: bool = Field(default=False, description="Allow plugin HTTP auth hook grant decisions to override RBAC checks") |
| # MCP Client Authentication | ||
| mcp_client_auth_enabled: bool = Field(default=True, description="Enable JWT authentication for MCP client operations") | ||
| mcp_require_auth: Optional[bool] = Field( | ||
| default=None, | ||
| description=( | ||
| "Require authentication for /mcp endpoints. " | ||
| "When unset, inherits AUTH_REQUIRED. " | ||
| "Set false explicitly to allow unauthenticated access to public items only; " | ||
| "set true to require a valid Bearer token for all /mcp requests." | ||
| ), | ||
| mcp_require_auth: bool = Field( | ||
| default=False, | ||
| description="Require authentication for /mcp endpoints. If false, unauthenticated requests can access public items only. " "If true, all /mcp requests must include a valid Bearer token.", | ||
| ) |
There was a problem hiding this comment.
mcp_require_auth is now a non-optional bool defaulting to False, and the previous logic that derived it from auth_required was removed. This changes the default behavior to allow unauthenticated access to MCP endpoints (public-only) even when AUTH_REQUIRED=true. If that behavior change isn’t intentional and explicitly documented/migrated, consider restoring the old defaulting behavior or default mcp_require_auth to True when auth_required is enabled.
| # First-Party | ||
| from mcpgateway.config import settings | ||
| from mcpgateway.services.nl_execution_service import ( | ||
| IntentClassification, | ||
| NLExecutionService, | ||
| SlotFillingResult, | ||
| ToolCandidate, | ||
| ToolMatch, | ||
| ) |
There was a problem hiding this comment.
settings is imported but never used in this test file, which will fail linting in setups that enforce unused imports. Remove the unused import or use it in an assertion.
| llmchat_enabled: bool = Field(default=False, description="Enable LLM Chat feature") | ||
| toolops_enabled: bool = Field(default=False, description="Enable ToolOps feature") | ||
| plugins_can_override_rbac: bool = Field( | ||
| default=False, | ||
| description=("Allow HTTP_AUTH_CHECK_PERMISSION plugins to short-circuit built-in RBAC grants. " "Disabled by default so plugin grant decisions are audit-only unless explicitly enabled."), | ||
| ) | ||
| plugins_can_override_auth_headers: bool = Field( | ||
|
|
||
| # Natural language tool execution settings | ||
| nl_execution_enabled: bool = Field(default=False, description="Enable natural language tool execution endpoints") | ||
| nl_execution_model: str = Field(default="", description="LLM model ID for NL execution") | ||
| nl_execution_temperature: float = Field(default=0.3, ge=0.0, le=2.0, description="Default temperature for NL execution") | ||
| nl_execution_max_tokens: int = Field(default=1000, ge=1, description="Maximum tokens for NL execution prompts") | ||
| nl_execution_min_confidence: float = Field(default=0.6, ge=0.0, le=1.0, description="Minimum tool match confidence threshold") | ||
| nl_execution_max_tool_candidates: int = Field(default=5, ge=1, le=50, description="Maximum tool candidates for NL matching") | ||
| nl_execution_semantic_threshold: Optional[float] = Field(default=None, ge=0.0, le=1.0, description="Optional semantic similarity threshold") | ||
| nl_execution_allow_inference: bool = Field(default=True, description="Allow inferred parameters during slot filling") | ||
| nl_execution_max_clarification_rounds: int = Field(default=3, ge=1, description="Maximum clarification turns per session") | ||
| nl_execution_confirm_high_risk: bool = Field(default=True, description="Require confirmation for high-risk tools") | ||
| nl_execution_confirm_destructive: bool = Field(default=True, description="Require confirmation for destructive tools") | ||
| nl_execution_sensitive_param_patterns: List[str] = Field( | ||
| default_factory=lambda: ["password", "secret", "token", "production"], | ||
| description="Regex patterns for sensitive parameters", | ||
| ) | ||
| nl_execution_followups_enabled: bool = Field(default=True, description="Include follow-up suggestions in NL responses") | ||
| nl_execution_max_followups: int = Field(default=3, ge=1, description="Maximum follow-up suggestions") | ||
| nl_execution_context_ttl: int = Field(default=3600, description="TTL in seconds for NL conversation context") | ||
| nl_execution_rate_limit: int = Field(default=30, ge=0, description="Requests per minute limit for NL endpoints") | ||
| nl_execution_max_context_messages: int = Field(default=20, ge=1, description="Maximum messages retained in NL context") | ||
|
|
There was a problem hiding this comment.
This PR is described as adding NL tool execution, but config.py also changes several unrelated defaults/flags (e.g., llmchat_enabled default, SSRF defaults, mcp_require_auth, plugin override behavior) and adds large new config sections. If these are intentional, they should be called out explicitly in the PR description (and potentially split into a separate PR) to reduce rollout risk and make review/approval clearer.
|
Thanks @AbdulR11. Ambitious feature — however there are two blockers:
|
Signed-off-by: AbdulR11 <147985851+AbdulR11@users.noreply.github.com>
1abc723 to
a25d509
Compare
🔗 Related Issue
Closes #2248
📝 Summary
This PR adds a Natural Language Tool Execution API so users can run tools through conversational queries. The gateway classifies intent, matches tools, extracts parameters, confirms risky actions, executes tools, and formats responses back into natural language.
Problem Statement
Tool execution currently requires:
Solution
Key Components
/api/v1/nl/*🏷️ Type of Change
Test Coverage
📓 Notes
Files Changed
Core Implementation
mcpgateway/services/nl_execution_service.py- NL execution orchestrationmcpgateway/routers/nl_router.py- NL endpointsmcpgateway/config.py- NL settings and feature flagsmcpgateway/main.py- Router registrationTests
tests/unit/mcpgateway/services/test_nl_execution_service.py- Service coveragetests/unit/mcpgateway/routers/test_nl_router.py- Router coveragetests/unit/mcpgateway/routers/__init__.py- Package marker for router testsUsage Example
Execute with natural language:
POST /api/v1/nl/execute { "query": "What's the weather in San Francisco?" }Confirm risky action:
POST /api/v1/nl/confirm { "session_id": "<session>", "confirm": true }Benefits
For Users:
For Operators:
Configuration