View reasoning information

Reasoning models, such as Google’s Gemma 4, return information about their reasoning steps alongside the final answer. This page explains how to identify reasoning-capable models on Serverless Inference, where to find reasoning output in a response, and how to turn reasoning on or off for models that support toggling it. To determine whether a model supports reasoning, check the following Supported models table or the Supported Features section of its catalog page in the UI. Reasoning information appears in the reasoning field of responses. The value of this field is null in the responses of non-reasoning models.

Supported models with reasoning

The following table lists the models on Serverless Inference that return reasoning output. Each supported model might always include reasoning, or might disable or enable reasoning by default:

Model ID (for API usage)	Reasoning support
`google/gemma-4-31B-it`	Enabled by default
`MiniMaxAI/MiniMax-M2.5`	Always on
`moonshotai/Kimi-K2.5`	Always on
`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8`	Enabled by default
`openai/gpt-oss-120b`	Always on
`openai/gpt-oss-20b`	Always on
`Qwen/Qwen3.5-35B-A3B`	Enabled by default
`Qwen/Qwen3-235B-A22B-Thinking-2507`	Always on
`zai-org/GLM-5.1`	Enabled by default

Models with `Always on` reasoning

If a model is listed as Always on in the preceding Supported models table, it always includes reasoning and this cannot be disabled.

Disable reasoning

If a model is listed as Enabled by default in the preceding Supported models table, you can disable reasoning to reduce token usage or simplify the response. To opt out of reasoning for a request, in chat_template_kwargs, set the enable_thinking flag to the value False (Python) or false (Bash):

Python
Bash

import openai

client = openai.OpenAI(
    base_url='https://api.inference.wandb.ai/v1',
    api_key="[YOUR-API-KEY]",  # Create an API key at https://wandb.ai/settings
)

response = client.chat.completions.create(
    model="google/gemma-4-31B-it",
    messages=[
        {"role": "user", "content": "3.11 and 3.8, which is greater?"}
    ],
    extra_body={
        "chat_template_kwargs": {
            "enable_thinking": False
        }
    },
)

curl https://api.inference.wandb.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer [YOUR-API-KEY]" \
  -d '{
    "model": "google/gemma-4-31B-it",
    "messages": [
      { "role": "user", "content": "3.11 and 3.8, which is greater?" }
    ],
    "chat_template_kwargs": {"enable_thinking": false}
  }'

Enable reasoning

If a model is listed as Disabled by default in the preceding Supported models table, you can enable reasoning by setting the enable_thinking flag to the value True (Python) or true (Bash) in the preceding code snippet.

Response Settings

Tutorials

API Reference

Supported models with reasoning

Models with `Always on` reasoning

Disable reasoning

Enable reasoning

Response Settings

Tutorials

API Reference

Documentation Index

​Supported models with reasoning

​Models with Always on reasoning

​Disable reasoning

​Enable reasoning

Supported models with reasoning

Models with `Always on` reasoning

Disable reasoning

Enable reasoning