DoublewordDoubleword

Model Name

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4

Nemotron 3 Super 120B A12B

  • Type: Generation
  • Capabilities: reasoning

Overview

NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.


In line with NVIDIA's guidance, we use temperature=1.0 and top_p=0.95 across all tasks and serving backends, including reasoning, tool calling, and general chat.


To enable reasoning, pass extra_body={"chat_template_kwargs": {"enable_thinking": true}}. For a more concise reasoning mode that uses significantly fewer reasoning tokens than full thinking mode, add "low_effort": true to the same payload.

Pricing

PriorityInput Tokens (per 1M)Output Tokens (per 1M)
Realtime1$0.30$0.75
High (1h)$0.23$0.56
Standard (24h)$0.15$0.38

Playground

Open this model in the Playground.

Footnotes

  1. Realtime availability is limited. Doubleword is primarily a batch API.