Model Name
google/gemma-4-31B-itGemma 4 31B IT
- Type: Generation
- Capabilities:
vision,reasoning,enhanced_structured_generation
Overview
Gemma 4 31B is Google DeepMind’s most capable open model, built for advanced reasoning, coding, and multimodal understanding. It sits in the same general tier as Claude 4.5 Haiku and NVIDIA Nemotron 3 Super, with native function calling and structured JSON output for agentic workflows; strong image and video understanding for tasks like OCR and chart analysis; 256K context for long documents and repositories; and support for 140+ languages.
Thinking Mode
To enable reasoning, include the following in your request body: "chat_template_kwargs": {"enable_thinking": false}
———
Multimodal Input
Gemma 4 supports multimodal input, so you can send images or videos together with text in a single request.
Image Example
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
},
{
"type": "text",
"text": "Describe this image."
}
]
}
]Video Example
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"image_url": {
"url": "https://example.com/sample_video.mp4"
}
},
{
"type": "text",
"text": "Summarize what happens in this video."
}
]
}
]Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| Realtime1 | $0.14 | $0.40 |
| High (1h) | $0.11 | $0.30 |
| Standard (24h) | $0.07 | $0.20 |
Playground
Open this model in the Playground.
Footnotes
-
Realtime availability is limited. Doubleword is primarily a batch API. ↩