Tech Explorer Logo

Search Content

Qwen3 Parameter Overview: From 0.6B to 235B, The Ultimate Balance of Hybrid Inference and Multimodality

3 min read
Cover image for Qwen3 Parameter Overview: From 0.6B to 235B, The Ultimate Balance of Hybrid Inference and Multimodality

The latest release of the Qwen3 series models by Alibaba Cloud’s Tongyi Qianwen team has drawn industry attention with its diverse model scales and innovative hybrid inference mode. Covering eight models ranging from 0.6B to 235B parameters, Qwen3 not only excels in language, mathematics, and coding tasks but also achieves the ultimate balance between performance and efficiency through MoE (Mixture of Experts) and Dense architectures. The following table details the core parameters and features of the Qwen3 series, revealing its technical core.

Qwen3 Model Parameter Overview

Model NameTotal ParametersActive ParametersArchitecture TypeContext LengthSupported LanguagesLicenseKey Features
Qwen3-235B-A22B235B22BMoE128K token119Qwen LicenseFlagship model, coding and math capabilities comparable to DeepSeek-R1, Grok-3, efficient inference
Qwen3-30B-A3B30B3BMoE128K token119Qwen LicenseSmall MoE, outperforms Qwen2.5-32B, low inference cost, suitable for local deployment
Qwen3-32B32B32BDense128K token119Apache 2.0High-performance dense model, suitable for complex tasks, inference capability matching Qwen2.5-72B
Qwen3-14B14B14BDense128K token119Apache 2.0Medium scale, balancing performance and resource usage, suitable for enterprise applications
Qwen3-8B8B8BDense128K token119Apache 2.0Lightweight and efficient, suitable for edge devices, performance comparable to Qwen2.5-14B
Qwen3-4B4B4BDense128K token119Apache 2.0Small model, fast inference speed, performance close to Qwen2.5-7B
Qwen3-1.7B1.7B1.7BDense128K token119Apache 2.0Ultra-lightweight, suitable for mobile devices, performance matching Qwen2.5-3B
Qwen3-0.6B0.6B0.6BDense128K token119Apache 2.0Smallest scale, minimal resource requirements, suitable for low-power scenarios
ModeTemperatureTopPTopKMinPPresence PenaltyOllama SettingsNotes
Thinking Mode0.6 Controls randomness, lower values more stable0.95 Cumulative probability sampling, higher values increase diversity20 Selects top K words, balances diversity0 No probability floor, maximum flexibility0 ~ 2 Reduces repetition, use high values with cautionnum_ctx=40960 num_predict=32768 keep_alive=-1Disable greedy decoding to avoid performance degradation and repetition.
Non-thinking Mode0.7 Slightly higher randomness, increases creativity0.8 Lower value, more focused output20 Selects top K words, balances diversity0 No probability floor, maximum flexibility0 ~ 2 Reduces repetition, use high values with cautionnum_ctx=40960 num_predict=32768 keep_alive=-1High presence_penalty may cause language mixing

Parameter and Feature Analysis

Model Scale and Architecture Type

Qwen3 series offers two architectures:

  • MoE (Mixture of Experts): Models like Qwen3-235B-A22B and Qwen3-30B-A3B achieve efficient inference by activating only a portion of parameters (22B or 3B). Despite large total parameter counts, their computational cost is comparable to small-scale dense models. MoE architecture shows excellent performance in coding and mathematical tasks with significantly improved inference speed.
  • Dense (Dense Model): Full-parameter models ranging from 0.6B to 32B, suitable for scenarios requiring stable high performance. Small models (like Qwen3-0.6B) are optimized for edge devices, while large models (like Qwen3-32B) excel in complex inference tasks.

Context Length

All Qwen3 models support a context length of 128K tokens, capable of handling extra-long documents or multi-turn dialogues, with generation capability up to 8K tokens. This feature provides a clear advantage in long-text generation and document comprehension tasks.

Multilingual Support

Qwen3 supports 119 languages and dialects, covering Chinese, English, European languages, and low-resource languages, making it suitable for global multilingual application scenarios.

Hybrid Thinking Mode

Qwen3 pioneers the switching between thinking and non-thinking modes:

  • Thinking Mode: Uses chain-of-thought (CoT) reasoning for step-by-step deduction, suitable for complex mathematical, coding, and logical reasoning tasks.
  • Non-thinking Mode: Provides quick responses to simple queries, optimizing latency and computational cost.

This design is achieved through four-stage training (long CoT cold start, reasoning-based RL, thinking mode fusion, general RL), significantly improving task adaptability.

License and Open Source Strategy

  • Dense Models (0.6B-32B) use the Apache 2.0 license, suitable for commercial applications.
  • MoE Models (235B-A22B, 30B-A3B) use the Qwen License, more suitable for research scenarios.

Performance and Efficiency

Benchmark Performance

  • Qwen3-235B-A22B: Competes with top models like DeepSeek-R1 and Grok-3 in tests such as MMLU-Pro and LiveCodeBench, with particularly outstanding coding and mathematical capabilities.
  • Qwen3-30B-A3B: Despite activating only 3B parameters, it surpasses Qwen2.5-32B with 10x improvement in inference efficiency, suitable for local deployment and real-time applications.
  • Small Models: Models like Qwen3-4B perform comparably to Qwen2.5-72B, suitable for resource-constrained scenarios.

Share

More Articles