toolsdevelopmentadvancedmcp

What Community Data Shows About Building Custom Tools for Gemini CLI

Synthesizing how the ecosystem approaches tool definition, permission gates, and testing for Gemini CLI custom tools and MCP servers — based on open GitHub issues and official documentation.

muzhihao

April 24, 202613 min read

Introduction

After reviewing the open issues on google-gemini/gemini-cli, three recurring failure modes dominate community reports about custom tool integration: JSON Schema incompatibilities that prevent tool discovery, missing permission gates that trigger policy-engine rejections at runtime, and tool descriptions so vague that the model consistently picks the wrong tool or constructs malformed arguments. Together these three patterns account for the overwhelming share of "my custom tool never gets called" threads visible across the repository's issue tracker.

This article synthesizes what those failure reports, the official Tools API documentation, and community MCP server implementations collectively reveal about the correct way to define, gate, and validate custom tools for Gemini CLI. The audience is engineers who have already installed Gemini CLI and are now trying to extend it with domain-specific capabilities — whether via the native tool registration interface or via a standalone MCP server process.

The article does not walk through hands-on experiments; every pattern described below is sourced to a public GitHub issue, an official doc page, or a community implementation that any reader can inspect.

TL;DR

Of the schema-related failures catalogued in the issue tracker, a disproportionate share trace to $defs / $ref usage in JSON Schema — the Gemini API rejects schemas with unresolved references, and the CLI's schema sanitizer did not originally handle them (issue #13142, issue #13326).
The policy engine evaluates every tool call against a priority-ordered rule set; tools without explicit allow rules default to ask_user for write operations, which silently blocks automation unless overridden.
Community MCP servers converge on a narrow set of structural patterns: flat (non-nested) inputSchema, explicit required arrays, enum-constrained string fields, and description strings that mention the format of the output — not just what the tool does.
The official MCP Go codelab and the modelcontextprotocol/typescript-sdk README both treat Zod-to-JSON-Schema as the canonical way to define inputSchema, keeping schema and runtime validation in sync.
discoveryCommand / callCommand (the shell-based tool discovery path) requires the discovery command to emit a JSON array of FunctionDeclaration objects; any deviation silently drops tools from the model's context.

Problem Domain: When Custom Tools Actually Matter

The default Gemini CLI toolset covers file read/write, shell execution, and web search. The gap that custom tools fill is proprietary integration: internal APIs with non-standard authentication, database schemas the model has never seen, or CI/CD pipelines whose commands are not standard shell idioms.

The official tools API doc distinguishes two integration paths:

Native tool registration — a TypeScript object satisfying the GeminiTool interface, loaded in-process at startup via the config file.
Discovered tools via shell commands — tools.discoveryCommand emits FunctionDeclaration JSON; tools.callCommand receives the tool name as $1 and JSON arguments on stdin, emits results on stdout.

A third path — MCP server — runs as a separate process communicating over stdio or HTTP. This is the path the GitHub MCP server installation guide and the Docker MCP Toolkit guide demonstrate for third-party integrations.

The community data shows that choice of path matters less than quality of schema and permission configuration. The same failure modes appear across all three.

// Minimal FunctionDeclaration structure expected by discoveryCommand output
{
  "name": "query_internal_api",
  "description": "Fetches order records from the internal orders service. Returns JSON array of order objects with fields: id, status, created_at, total_cents.",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "The numeric order ID as a string, e.g. '10042'."
      },
      "status_filter": {
        "type": "string",
        "enum": ["pending", "fulfilled", "cancelled"],
        "description": "Optional status filter. Omit to return all statuses."
      }
    },
    "required": ["order_id"]
  }
}

The description line is doing critical work here: it names the return shape. Without it, the model must guess what to do with the output — and guesses frequently appear as follow-up bugs in the same GitHub threads where the schema was originally the complaint.

Common Approaches and Why They Fail

Approach 1: Copying Schema Patterns From Other Ecosystems

A significant cluster of failures documented in issue #13326 and issue #13142 originated from MCP servers whose inputSchema was generated by JSON Schema libraries that produce $defs + $ref for reused sub-schemas. The Gemini API — which the CLI routes tool schemas through — returns HTTP 400 on schemas containing unresolved $ref references. The CLI's schema sanitizer at the time of these reports (v0.15.1) stripped $defs during processing but did not inline the referenced definitions before stripping, leaving dangling $ref pointers.

The community workaround, visible across multiple replies in issue #13142, is to flatten all sub-schemas inline and eliminate $defs entirely before registration. Libraries like Zod, when configured with .openapi() or compiled via zod-to-json-schema with $refStrategy: "none", produce flat schemas compatible with the Gemini API.

Approach 2: Assuming Default Policy Allows Automation

Issue #18750, a policy-engine documentation review, surfaced that the policy engine reference ships with defaults where write-class tools default to ask_user — meaning unattended scripts that register write tools will pause awaiting confirmation. Engineers who tested their tools interactively then deployed them in non-interactive CI environments reported that the tool calls stalled rather than failing with a clear error.

The fix is explicit: add a TOML rule in settings.json that matches the tool name and sets action = "allow" for automation contexts. The documentation issue also noted that the policy engine page did not previously enumerate valid tool name strings, making it impossible to write correct rules without reading source code.

Evidence-Based Patterns From Community MCP Implementations

Surveying the public MCP server corpus — including centminmod/gemini-cli-mcp-server, angrysky56/gemini-cli-mcp-server, jamubc/gemini-mcp-tool, and the modelcontextprotocol/servers reference collection — reveals convergence on five structural patterns.

Pattern 1: Flat, Self-Contained inputSchema

Every production-quality server in the corpus uses flat inputSchema objects with no $ref. Properties are described inline. The typescript-sdk server documentation makes this the canonical registration path:

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

const server = new McpServer({ name: "my-server", version: "1.0.0" });

server.registerTool(
  "fetch_order",
  {
    description:
      "Returns an order record by ID. Output is JSON with fields: id (string), " +
      "status (pending|fulfilled|cancelled), total_cents (integer).",
    inputSchema: {
      order_id: z.string().describe("The numeric order ID, e.g. '10042'."),
      include_line_items: z
        .boolean()
        .optional()
        .describe("Set true to include the line_items array in the response."),
    },
  },
  async ({ order_id, include_line_items }) => {
    // implementation
  }
);

The Zod schema is compiled to flat JSON Schema by the SDK before being sent to the host client. No $defs are emitted.

Pattern 2: Enum Constraints on Categorical Inputs

Of the servers that documented schema-related failures in community threads, those that subsequently resolved the issues consistently added enum arrays to any string property with a finite domain. The function calling documentation on Google AI for Developers notes that the model uses enum values to populate arguments — without them, the model generates free-text values that often fail downstream validation.

// Before: model generates arbitrary strings
status: z.string().describe("Order status.")

// After: model selects from the enum, validation rate improves
status: z.enum(["pending", "fulfilled", "cancelled"]).describe(
  "Order status filter. Use 'pending' for unprocessed orders."
)

Pattern 3: Output Shape in the Description

The Practical Gemini CLI: Tool Calling article on Google Cloud Community documents that the model's ability to reason over a tool's return value is entirely dependent on the description field. Tools that described only their action ("fetches order data") generated follow-up tool calls trying to re-fetch information that was already returned. Tools that described their output shape ("returns JSON with fields: id, status, total_cents") eliminated that retry pattern.

Pattern 4: Explicit Permission Rules for Write Tools

The policy engine reference defines the TOML rule syntax for pre-approving specific tools. Community servers that target automation workflows include a sample settings.json snippet in their README:

# In your Gemini CLI settings.json under the "policy" key
[[rules]]
toolName = "write_order_note"
action = "allow"
priority = 10

Without this, the CLI halts on the first write-class tool call to request confirmation — a behavior that breaks non-interactive pipelines silently.

Pattern 5: Three-Layer Error Typing

The freecodecamp MCP handbook and the official codelab both distinguish three error classes that must be represented differently to the model:

// Validation failure — model can retry with corrected args
if (!isValidOrderId(order_id)) {
  return {
    content: [{ type: "text", text: `Invalid order_id format: ${order_id}. Expected numeric string.` }],
    isError: true,
  };
}

// Transient failure — model may retry
if (networkTimeout) {
  return {
    content: [{ type: "text", text: "Service temporarily unavailable. Retry in 30s." }],
    isError: true,
  };
}

// Permanent failure — model should not retry
return {
  content: [{ type: "text", text: `Order ${order_id} not found. No retry warranted.` }],
  isError: true,
};

Collapsing all three into a generic error string removes the model's ability to self-correct, contributing to the "model keeps retrying the same broken call" failure pattern visible in several GitHub threads.

Quantified Analysis

From the open issues on the google-gemini/gemini-cli repository tagged or clearly related to custom tool / MCP integration, the distribution of root causes visible in publicly resolved threads breaks down as follows:

Schema incompatibility ($defs/$ref, missing type, anyOf without top-level type): the largest cluster, spanning issues #13142, #13326, #11020, and the Cline cross-project issue cline#7339. The pattern repeats across unrelated MCP servers (Snowflake MCP, BEADS MCP, Basic Memory MCP, Notion MCP), indicating the root cause is in how external schema generators emit JSON Schema rather than in any single server's code.
Permission / policy misconfiguration: the second cluster, documented in issue #18750 and the official troubleshooting guide. Write tools blocked in automation contexts because no allow rule was defined.
Vague or action-only descriptions: the third cluster, described in the tool calling tutorial and the issue #5855 thread about hallucinated tool configuration — the model filled in tool arguments incorrectly because property descriptions did not constrain the expected value format.

The official troubleshooting guide also documents sandbox permission errors as a separate fourth category: tools that attempt to write outside the project directory or system temp directory fail with EPERM when sandboxing is active, regardless of the policy engine configuration.

Edge Cases Documented in Community Reports

The additionalProperties: false pitfall. Issue #8022 on structured JSON output reports that schemas with additionalProperties: false at the top level cause the Gemini API to return a 400 in certain model versions. The workaround is to omit additionalProperties from the schema while relying on the execution layer to ignore unexpected fields.

The anyOf top-level type requirement. Issue #13326 established that schemas using anyOf must include a top-level type: "object" alongside the anyOf array. Schemas generated by tools like Pydantic or TypeScript's ts-json-schema-generator without this top-level type field are rejected by the Gemini API with an undescriptive 400 error, making the root cause difficult to diagnose without inspecting the raw API request.

Tool name collision across MCP servers. The MCP servers documentation notes that when multiple MCP servers expose a tool with the same name, the CLI's behavior is undefined in older versions. Community workaround: namespace tool names with a server prefix (github__list_prs rather than list_prs).

Discovery command exit codes. The discoveryCommand path requires the shell command to exit with code 0 and emit valid JSON to stdout. Commands that print to stderr and exit 0 cause the CLI to silently register zero tools with no error surfaced to the user, producing a confusing "model ignores my tools" symptom.

Recommendation

Based on the failure distribution above, the most robust approach for new custom tool implementations is to use the modelcontextprotocol/typescript-sdk with Zod schemas compiled to flat JSON Schema, rather than hand-authoring inputSchema JSON or using a schema generator that emits $defs. This eliminates the largest single failure class (schema incompatibility) by construction.

For the permission layer, treat every tool that modifies state as requiring an explicit TOML allow rule if the tool will be called in any non-interactive context. The policy engine documentation now enumerates valid tool name strings following the audit in issue #18750 — consult it before writing rules.

For description quality, apply the test from the function calling guide: a description is sufficient if a developer reading it for the first time knows (a) when to call the tool, (b) what values to pass for each property, and (c) what the return value will look like. If any of those three are missing from the description, the model will make up the missing information — usually incorrectly.

FAQ

Q: Why does the Gemini API return 400 on my MCP server's tool schema?

The most common root cause, per issue #13326, is $defs/$ref references that the CLI's schema sanitizer does not inline before forwarding to the API. Flatten all sub-schemas inline and remove $defs. A secondary cause (issue #8022) is additionalProperties: false on certain model versions — omit it and validate unexpected fields in the execution layer instead.

Q: My tool is registered but never called. What is the model seeing?

Per the troubleshooting guide, enable debug logging (GEMINI_DEBUG=1) to inspect the system prompt that reaches the model. If the tool's name appears but the model still avoids it, the description is likely the issue: action-only descriptions ("fetches data") do not give the model enough signal to select the tool over built-in alternatives. Add the output shape and a concrete use-case example to the description string.

Q: How do I allow a write tool to run without confirmation in CI?

Add an explicit allow rule for the tool name in the policy configuration, as documented in the policy engine reference. Per the findings in issue #18750, the default for write-class tools is ask_user — there is no way to suppress confirmation without an explicit rule. The same issue confirmed that the policy engine documentation now lists valid tool name strings, which were previously undocumented.

Q: Should I use an MCP server or native tool registration?

The official MCP server documentation recommends MCP servers when the tool catalogue is large, when multiple AI clients need to share the same tools, or when the tooling team is separate from the Gemini CLI consumer. Native registration is simpler for one-off tools that only one team uses and that do not need to be shared with other AI hosts. From a schema and permission perspective, both paths face identical constraints.

Q: Is there a canonical tutorial for building an MCP server for Gemini CLI?

Google published a Go-based walkthrough on Google Codelabs that covers the full lifecycle from server scaffold to Gemini CLI integration. For TypeScript, the FreeCodeCamp MCP server handbook and the official TypeScript SDK README cover equivalent ground with more language-specific detail.

Was this article helpful?