News
Mini Max AI/Mini Max-M3 " 427 B / 26 B active " MOE " 1024 K ctx
4+ hour, 14+ min ago (277+ words) Mini Max M3 vision-language Mo E (427 B total / 26 B active) for frontier coding, agent toolchains, and 1 M-token reasoning via MSA sparse attention " native multimodal (image + video + computer use); BF16 checkpoint with an MXFP8 variant from NVIDIA. Runs on NVIDIA (Hopper/Blackwell) and on…...
Qwen/Qwen3. 6-35 B-A3 B " 35 B / 3 B active " MOE " 256 K ctx
1+ mon, 3+ week ago (219+ words) Smaller Qwen3. 6 multimodal Mo E model (35 B total / 3 B active) with BF16, FP8, and NVIDIA NVFP4 variants Compact Qwen3. 6 Mo E with 3 B active parameters " single-GPU FP8 or 2-4 GPU BF16 serving Qwen3. 6-35 B-A3 B is the smaller sibling of Qwen3. 5, sharing the same gated-delta-networks Mo E architecture but with 35 B…...
poolside/Laguna-XS. 2 " 33 B / 3 B active " MOE " 128 K ctx
1+ mon, 1+ week ago (144+ words) recipes. vllm. ai Laguna XS. 2 is Poolside's 33 B-total / 3 B-activated Mixture-of-Experts model purpose-built for agentic coding and long-horizon work. It combines mixed sliding-window + global attention (3: 1 across 40 layers) with sigmoid per-head gating and FP8 KV cache, so it stays compact enough to run locally…...