troubleshootingdebugginginstallationguide

Complete Gemini CLI Troubleshooting Guide: The 5-Stage Diagnostic Framework

The structured hub for Gemini CLI troubleshooting — maps the five-stage gap pattern (install → auth → config → runtime → advanced) with analyst synthesis, external citations, and deep-links into 15 diagnostic subpages.

muzhihao

April 24, 202616 min read

Introduction

After scanning the open issue tracker at github.com/google-gemini/gemini-cli — which holds thousands of filed reports — a clear structural pattern emerges across "Gemini CLI not working" complaints. Problems do not distribute randomly across the tool's surface. They cluster at five distinct transition points: the moment a user tries to get the binary on their PATH, the moment they first hit the API with a key, the moment their local configuration diverges silently from what they intended, the moment a working session degrades at runtime, and the moment they push into advanced integration territory with sandbox, custom tools, or MCP servers.

The community term for this pattern is the stage gap: each transition requires a different mental model, different diagnostic commands, and different fixes. A developer stuck at the MCP integration stage has already cleared installation, authentication, configuration, and basic runtime — yet the error messages from the failing stage offer no memory of that context. The official Gemini CLI troubleshooting guide documents exit codes 41–53, certificate errors, and CI-environment detection quirks, but it is not organized around this stage structure. That gap is what this hub addresses.

This article maps the full five-stage framework, synthesizes the community diagnostic workflows for each stage, and links out to 15 dedicated /qa/* subpages for step-by-step resolution. The companion piece Gemini CLI Not Working? Evidence-Based Fixes covers the same territory from the angle of specific error categories (401, 403, 429, hangs, configuration drift) with granular GitHub issue citations — read that first if you already know your error code. Read this one if you need to figure out which stage you are at.

TL;DR

Of the four diagnostic layers (local install, shell environment, network, Google backend), the shell-environment layer produces the most misleading symptoms — the same 403 error can mean a disabled API, a billing gap, or a backend routing bug, and which fix applies depends entirely on isolating the layer first.
The five-stage gap pattern means that most "generic" troubleshooting advice fails because it addresses the wrong stage; stage identification is the non-optional first step.
Gemini CLI reads configuration from a seven-tier precedence hierarchy: hardcoded defaults → system defaults → user settings → project settings → system settings → environment variables → command-line arguments. Environment variables sit at tier 6 and silently override all file-based settings.
Runtime hangs break into two mechanically unrelated root causes (missing ripgrep binary vs. agent retry loop) that share no fix — the presenting symptom is identical.
MCP integration failures almost always surface at one of three interfaces: server start, tool registration, or tool execution — each has a distinct diagnostic signature accessible via the /mcp command in session.

The Five-Stage Framework

The framework below is a synthesis of the recurring patterns across community reports and the official documentation. Each stage represents a transition point where a new class of problem can emerge — and where clearing that stage is a prerequisite for meaningful debugging of later stages.

Stage 1: Installation  ──►  Stage 2: Authentication  ──►  Stage 3: Configuration
                                                                    │
                                                                    ▼
                                      Stage 5: Advanced  ◄──  Stage 4: Runtime

The critical implication: if you are debugging a Stage 4 runtime hang, you have implicitly cleared Stages 1–3. The diagnostic commands for Stage 4 assume those earlier stages are clean. When a Stage 4 command produces a Stage 2 error (e.g., a 401 in --debug output during a runtime hang investigation), you have not yet cleared Stage 2 — regardless of the fact that things seemed to work before.

Stage 1: Installation

Installation problems are the highest-volume first-contact failure mode. The failure surface is the local environment: Node.js version, PATH order, shell configuration, and npm permission model. None of these are specific to Gemini CLI — they are general Node.js global install problems that happen to surface here.

Three sub-causes account for the overwhelming majority of installation reports:

PATH conflict from legacy install location. GitHub issue #5886 documents a structural shift in Gemini CLI's install location across versions. Older versions wrote to /usr/local/bin/gemini; newer versions write to /usr/bin/gemini. Standard Linux $PATH searches /usr/local/bin before /usr/bin, meaning a normal npm install -g update can leave the shell executing a stale binary from the old path. The which gemini test in that issue is the canonical first diagnostic step.

npm permission errors on Linux. On Debian-family systems, running sudo npm install -g creates permission debt that surfaces as broken upgrades later. The correct pattern is to configure a user-level npm prefix or use nvm. Our Linux installation guide covers this for Ubuntu, Debian, Fedora, and Arch.

Node.js version mismatch. The @google/gemini-cli package requires Node.js 20 or higher. Ubuntu 22.04/24.04 ships Node 18 from its default repositories. Running on Node 18 produces inconsistent startup failures rather than a clean error message — issue #2264 documents this startup failure pattern.

# Stage 1 quick triage — run in order
gemini --version          # Is the binary present and executable?
which gemini              # Is PATH resolving to the expected location?
node --version            # Is Node >= 20.0.0?

If any of those three commands returns an unexpected answer, the fix is shell-configuration, not reinstall. For the complete step-by-step including Rosetta-vs-native Apple Silicon considerations:

Stage 2: Authentication

Once the CLI binary resolves on PATH and executes cleanly, authentication is the next wall. The three things that can go wrong are mechanically distinct: you have no key yet, you have a key in the wrong location, or you have the right key in the right place but it is rejected by the backend.

No key or wrong key type. Gemini CLI supports two separate authentication paths — a free-tier API key from Google AI Studio and enterprise OAuth through Google Cloud Vertex AI. These are not interchangeable: they have different rate limits, different billing implications, and different error signatures when misconfigured. Getting this choice wrong at the start produces errors that only become visible weeks later when quota walls appear. Our guide How to Get a Gemini API Key covers both paths.

Key in the wrong location. The GEMINI_API_KEY environment variable is the authoritative authentication source. Keys set in a project .env file are loaded by some application frameworks but not by the global CLI. Keys set in one terminal session are not inherited by new terminal windows unless persisted to the shell profile (.zshrc, .bashrc). Issue #25189 documents a case where OAuth authentication completes successfully but every API call still returns 403 PERMISSION_DENIED with zero tokens consumed — the backend routes the request to an unexpected cloudaicompanionProject, a server-side bug unrelated to the key itself. The workaround is switching from OAuth to explicit API key authentication.

Key rejected by backend. Three distinct backend causes produce 403 PERMISSION_DENIED: the Generative Language API is not enabled for the associated Google Cloud project, an account-tier routing mismatch (documented in issue #24517, priority/p1, 141 comments, open as of April 2026), and the silent AI Studio tier downgrade introduced by Google's January 2026 billing restructure (issue #24396). The distinguishing test: run the same request as a direct curl call. If curl also returns 403, the problem is the Google Cloud project, not the CLI.

# Stage 2 quick triage
echo -n "$GEMINI_API_KEY" | wc -c          # Expected: ~39 chars for AIza... keys
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" \
  | python3 -m json.tool | head -10        # Does the backend accept the key directly?
gemini --debug "hello" 2>&1 | head -40     # What does the CLI report for auth?

Dedicated guides:

Stage 3: Configuration

With a working binary and a valid key, the next failure class is silent configuration drift — the CLI behaves differently from what the settings files specify, with no error to explain why.

The authoritative source for this is the official configuration reference, which defines a seven-tier precedence hierarchy: hardcoded defaults (lowest) → system defaults file → user settings file (~/.gemini/settings.json) → project settings file (.gemini/settings.json) → system settings file → environment variables → command-line arguments (highest). Environment variables sit at tier 6, above all file-based configuration. This means GEMINI_MODEL overrides the model field in any settings.json, GEMINI_API_KEY overrides any auth configuration, and this override is silent — the CLI does not log that it is ignoring the file.

Two specific sub-cases from the official troubleshooting documentation are worth calling out:

CI environment false-positive. The CLI uses the is-in-ci package to detect non-interactive environments. Any environment variable prefixed with CI_ (e.g. CI_TOKEN) triggers non-interactive mode, suppressing prompts. This is a common "works locally, breaks in CI" cause unrelated to CI-specific configuration.

DEBUG / DEBUG_MODE variable exclusion. The official docs specifically note that DEBUG and DEBUG_MODE entries in a project .env file are automatically excluded to prevent interference with CLI behavior. Debug variables must be placed in .gemini/.env — not the project root .env — to take effect.

MCP server configuration. MCP servers are declared under mcpServers in settings.json. The official MCP server documentation notes that path values containing ~ or $HOME in the command field are not automatically expanded — the shell does not process these values. This is a silent failure: the server appears configured but never starts. Similarly, credentials needed by an MCP server must be explicitly declared in the server's env block using "KEY": "$MY_ENV_VAR" syntax; they are not inherited from the shell environment due to automatic redaction of sensitive variable names.

# Stage 3 quick triage
env | grep -i gemini                        # Which env vars are active?
env -u GEMINI_MODEL -u GEMINI_API_KEY \
  gemini --debug "test" 2>&1 | head -20    # Test settings.json in isolation
cat ~/.gemini/settings.json 2>/dev/null \
  || echo "No user settings file"
cat .gemini/settings.json 2>/dev/null \
  || echo "No project settings file"

Dedicated guides:

Stage 4: Runtime Issues

Once the CLI passes Stages 1–3, it will mostly operate correctly — until it does not. Runtime problems split into three shapes: unexplained hangs with no output, rate limit errors under seemingly normal load, and commands that worked yesterday failing today without any local changes.

Hangs

Hang reports on the issue tracker break into two mechanically unrelated root causes that share no fix.

Missing ripgrep binary (Linux / proxy environments). Issue #20433 (closed with workaround) identifies that Gemini CLI internally depends on ripgrep (rg) and on startup looks for it at ~/.gemini/tmp/bin/rg. In proxy-restricted or air-gapped Linux environments, the CLI's background attempt to download this binary fails silently and hits a 300-second network timeout, dead-locking the TUI. The fix is a symlink:

which rg || sudo apt-get install ripgrep   # Ensure ripgrep is installed
mkdir -p ~/.gemini/tmp/bin
ln -sf $(which rg) ~/.gemini/tmp/bin/rg

Agent retry loop (preview model output parsing failure). Issue #22415 documents a different hang: the CLI shows "This is taking a bit longer, we're still on it" indefinitely, and /stats after force-termination reveals dozens of API requests with nearly zero output tokens. The root cause is a model response the agent cannot parse, triggering silent background retries. Switching from a preview model designation to a stable release channel in settings.json eliminates this class of hang.

Rate Limits

Rate limit errors have two distinct root causes that require different responses. Genuine transient rate limiting (exceeding RPM or TPM quotas) responds to exponential backoff and model switching. The billing-restructure false 429 — introduced in January 2026 and documented in issue #24396 — does not respond to backoff because the quota is set to zero. The distinguishing test:

gemini --debug "hello" 2>&1 | grep -A 5 '"limit"'
# "limit": 0  →  billing fix required (link billing account in AI Studio)
# "limit": <positive number>  →  genuine rate limit, use backoff or switch model

Our guide Handle Gemini CLI Rate Limits covers the exponential-backoff configuration and the AI Studio billing-link workaround in full.

Silent Regressions

Commands that silently broke overnight — the CLI runs but returns wrong or truncated output — are almost always caused by an upstream model version change, a quota policy update, or a broken MCP server dependency. The diagnostic sequence:

gemini --version          # Confirm which CLI version is running
gemini --debug "hello"    # Check for backend error codes in verbose output
# Then in interactive mode: /mcp   →  check MCP server connection status

Full walkthrough: Debug Gemini CLI Issues and Why Gemini CLI Is Not Responding.

Stage 5: Advanced Integration

Advanced features — sandbox isolation, custom tool definitions, and MCP-mediated integrations — surface failure modes that beginners never encounter. These problems appear when a developer is productive enough with the CLI to push against its integration boundaries.

Sandbox

Sandbox mode runs Gemini CLI's file operations inside a restricted namespace. When path resolution crosses outside the sandbox root, the model reports success but the file does not change — no error is raised. On macOS, the SEATBELT_PROFILE environment variable controls the sandbox policy (options: permissive-open, restrictive-open, strict-open, strict-proxied). The official troubleshooting docs list exit code 44 (FatalSandboxError) for environments where Docker or Podman is expected but unavailable. Full model: Understand Gemini CLI Sandbox.

Custom Tools

Custom tool failures divide at the registration/invocation boundary. A tool can register correctly (the model sees it in the tool list) and still fail to invoke if the parameter schema does not validate against JSON Schema standards, or if execution permissions are not set correctly. Exit code 52 (FatalConfigError) fires when settings.json is invalid — which includes malformed tool definitions. Define Custom Gemini CLI Tools covers schema validation and per-tool permission gates.

MCP Integration

MCP servers are separate processes that Gemini CLI communicates with over stdio (or SSE/HTTP). Issue #1812 illustrates a class of problem specific to MCP debugging: the debug console truncates error context, making it difficult to identify whether the failure is at the handshake, tool-registration, or tool-execution layer. The /mcp command inside an active session displays each server's status (CONNECTED, CONNECTING, DISCONNECTED) and tool list — this is the primary in-session MCP diagnostic.

Issue #10051 documents a "fetch failed" / MCP error -32000: Connection closed pattern where the CLI consistently fails to connect to multiple MCP servers. The suspected root cause is a lower-level networking issue in the undici fetch implementation; the issue was closed as stale without a definitive fix, suggesting the most reliable mitigation is verifying that each MCP server command works independently before debugging the CLI connection.

Three-interface diagnostic sequence for MCP failures:

# 1. Server start: does the server process launch and produce output on its own?
<server-command> <server-args>    # Run the server command directly; check for errors

# 2. Tool registration: does /mcp show the server as CONNECTED with tools listed?
# (In active Gemini CLI session)
/mcp

# 3. Tool execution: does the tool fail when called, or does it never get called?
# Enable verbose mode and watch for MCP request/response pairs
gemini --debug "call the <tool-name> tool with <args>"

The official MCP server documentation notes that gemini mcp list and gemini mcp add are the CLI management commands for server inventory. Full integration reference: Gemini CLI MCP Integration.

Quantified Analysis

From the GitHub issues reviewed for this article, the distribution of community attention across stages is not even:

| Stage | Top Indicator Issue | Comment Volume | Maintainer Status | |---|---|---|---| | Stage 1 (Install) | #5886 PATH conflict | Moderate | Stale / priority/p3 | | Stage 2 (Auth) | #24517 403 OAuth routing | 141 comments | Open / priority/p1 | | Stage 2 (Auth) | #25189 403 cloudaicompanionProject | Open | Open / active | | Stage 3 (Config) | Config drift misattributed to auth | Dispersed | No dedicated tracking | | Stage 4 (Runtime) | #24396 429 limit:0 billing | 22+ comments | Partially resolved | | Stage 4 (Runtime) | #20433 ripgrep hang | 28 comments | Closed (symlink workaround) | | Stage 5 (Advanced) | #10051 MCP fetch failed | Moderate | Closed stale | | Stage 5 (Advanced) | #1812 MCP debug truncation | Low | Closed |

The concentration at Stage 2 (authentication) is striking: the two open 403 issues together hold more community engagement than all other stages combined. This skew reflects both the frequency of auth problems and the difficulty of diagnosing backend-side routing bugs without visibility into Google's infrastructure.

Stage 3 (configuration) generates fewer standalone issues not because it is less common, but because configuration drift symptoms are usually misattributed to authentication or rate limiting, inflating those categories' apparent volume. The env | grep -i gemini diagnostic in Stage 3 frequently reveals the true cause of what looked like a Stage 2 or Stage 4 problem.

When to Escalate

If diagnostic work through the five stages has not resolved the issue, three escalation paths exist. First, run gemini debug --all > debug.txt — the official troubleshooting guide explicitly states that reports with this file attached are triaged faster. Second, search the issue tracker before opening a new report; issue #24517's 141-comment thread demonstrates that edge cases often have buried workarounds. Third, for quota, billing, and account-tier issues, Google Cloud Support is the only resolution path — the CLI team cannot override platform-level limits. The community Discord has active channels segmented by category (installation, MCP, advanced) for peer review.

Conclusion

Based on the five-stage failure pattern documented across community reports and the official exit-code taxonomy, the most robust approach to Gemini CLI troubleshooting is stage identification before fix application. Applying authentication fixes to a configuration problem, or configuration fixes to a runtime hang, is the primary source of wasted diagnostic time. The two-minute triage sequence — gemini --version, echo $GEMINI_API_KEY, a direct curl against the API, and env | grep -i gemini — eliminates the most common cross-stage misdiagnosis before deeper investigation begins.

The hub structure of this guide is deliberate. Each of the 15 linked subpages below covers one problem within one stage in depth; this page provides the framework for knowing which subpage to reach for. Bookmark this guide and use the stage headers as the entry point when something breaks.

All 15 Diagnostic Deep-Links

Stage 1 — Installation

Stage 2 — Authentication

Stage 3 — Configuration

Stage 4 — Runtime

Stage 5 — Advanced Integration

Was this article helpful?