Gemma 4 31B IT

Type: Generation
Capabilities: vision, reasoning

Overview

Gemma 4 31B is Google DeepMind’s most capable open model, built for advanced reasoning, coding, and multimodal understanding. It sits in the same general tier as Claude 4.5 Haiku and NVIDIA Nemotron 3 Super, with native function calling and structured JSON output for agentic workflows; strong image and video understanding for tasks like OCR and chart analysis; 256K context for long documents and repositories; and support for 140+ languages.

Thinking Mode

To enable reasoning, include the following in your request body: "chat_template_kwargs": {"enable_thinking": true}

———

Multimodal Input

Gemma 4 supports multimodal input, so you can send images or videos together with text in a single request.

Image Example

"messages": [
  {
    "role": "user",
    "content": [
      {
        "type": "image_url",
        "image_url": {
          "url": "https://example.com/image.jpg"
        }
      },
      {
        "type": "text",
        "text": "Describe this image."
      }
    ]
  }
]

Video Example

"messages": [
  {
    "role": "user",
    "content": [
      {
        "type": "video_url",
        "image_url": {
          "url": "https://example.com/sample_video.mp4"
        }
      },
      {
        "type": "text",
        "text": "Summarize what happens in this video."
      }
    ]
  }
]

Pricing

Priority	Input Tokens (per 1M)	Output Tokens (per 1M)
Realtime¹	$0.12	$0.35
Async	$0.09	$0.26
Batch (24h)	$0.06	$0.18

Playground

Open this model in the Playground.

Realtime availability is limited. Doubleword is primarily a batch API. ↩

Gemma 4 31B IT

Overview

Pricing

Playground

Footnotes