Model Name
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4Nemotron 3 Super 120B A12B
- Type: Generation
- Capabilities:
reasoning
Overview
NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.
In line with NVIDIA's guidance, we use temperature=1.0 and top_p=0.95 across all tasks and serving backends, including reasoning, tool calling, and general chat.
To enable reasoning, pass extra_body={"chat_template_kwargs": {"enable_thinking": true}}. For a more concise reasoning mode that uses significantly fewer reasoning tokens than full thinking mode, add "low_effort": true to the same payload.
Pricing
| Priority | Input Tokens (per 1M) | Output Tokens (per 1M) |
|---|---|---|
| Realtime1 | $0.30 | $0.75 |
| High (1h) | $0.23 | $0.56 |
| Standard (24h) | $0.15 | $0.38 |
Playground
Open this model in the Playground.
Footnotes
-
Realtime availability is limited. Doubleword is primarily a batch API. ↩