Nemotron 3 Super 120B A12B

Type: Generation
Capabilities: reasoning

Overview

NVIDIA Nemotron 3 Super 120B A12B NVFP4 is an open hybrid Mamba-Transformer LatentMoE model with 120 billion total parameters and 12 billion active parameters, built for agentic reasoning workloads such as coding, planning, tool use, and long-context tasks. It sits in the same capability tier as Qwen3.5-122B non-reasoning and ahead of GPT-OSS-120B, while also delivering higher throughput.

In line with NVIDIA's guidance, we use temperature=1.0 and top_p=0.95 across all tasks and serving backends, including reasoning, tool calling, and general chat.

To enable reasoning, pass extra_body={"chat_template_kwargs": {"enable_thinking": true}}. For a more concise reasoning mode that uses significantly fewer reasoning tokens than full thinking mode, add "low_effort": true to the same payload.

Pricing

Priority	Input Tokens (per 1M)	Output Tokens (per 1M)
Realtime¹	$0.30	$0.75
Async	$0.23	$0.56
Batch (24h)	$0.15	$0.38

Playground

Open this model in the Playground.

Realtime availability is limited. Doubleword is primarily a batch API. ↩

Nemotron 3 Super 120B A12B

Overview

Pricing

Playground

Footnotes