MiniMax-M2.5 Is Now Fully Open Source

MiniMax-M2.5 now fully open source. Run a frontier AI model free on your Mac. Complete guide to 229B parameter model matching Claude Opus performance.

MiniMax, the Shanghai-based AI company that just IPO'd on the Hong Kong Stock Exchange at a $13.7 billion valuation, dropped its latest model as fully open weights this week. MiniMax-M2.5 is a 229-billion-parameter mixture-of-experts model that matches Claude Sonnet and GPT-4-class performance on coding benchmarks — and you can download it right now and run it on a Mac.

The model weights are on Hugging Face. The code is on GitHub. The license is Modified MIT. No waitlists, no API keys required.

As Akshay Pachaar (@akshay_pachaar) broke down on X, this release is significant because M2.5 is the first open-weights model to genuinely match proprietary frontier models on real-world software engineering tasks.

---

What Makes MiniMax-M2.5 Different

This isn't another open-source model that looks good on paper but falls apart in practice. Here's what sets it apart:

Mixture of Experts architecture. 229 billion total parameters, but only about 10 billion activate per token. That's the trick — you get frontier-level intelligence at a fraction of the compute cost. The model runs fast and cheap because most of its brain stays dormant for any given query. Coding performance that matches the best. M2.5 scored 80.2% on SWE-Bench Verified, putting it neck-and-neck with Claude Opus 4.6 (80.8%) and GPT-5.2 (80.0%). It handles multi-file repo-level code editing, full-stack development across Python, TypeScript, Rust, Go, and more. Best-in-class function calling. Scored 76.8 on BFCL (Berkeley Function Calling Leaderboard), outperforming every Claude and GPT model tested. If you're building AI agents that need to use tools autonomously, this model was built for that. Two variants available: - M2.5 Standard — 50 tokens/sec, optimized for cost - M2.5 Lightning — 100 tokens/sec, optimized for speed BenchmarkMiniMax-M2.5Claude Opus 4.6GPT-5.2 SWE-Bench Verified80.2%80.8%80.0% BFCL (function calling)76.8%LowerLower AIME25 (math)86.395.698.0 BrowseComp (web search)76.384.065.8

The math scores trail the top proprietary models. But for coding and agentic tasks — the stuff most developers actually care about — M2.5 is right there.

---

How to Run It for Free on a Mac

You have several options depending on your hardware. Here's the simplest path.

Option 1: Ollama (Easiest)

If you have Ollama installed, it's one command:

```bash ollama run minimax-m2.5 ```

Ollama handles downloading the quantized model, setting up the inference engine, and giving you a chat interface. Done.

Requirements: Mac with Apple Silicon (M1/M2/M3/M4). 64GB unified memory recommended for the full model. 32GB works with smaller quantizations but will be slower.

Option 2: llama.cpp with GGUF (Best for Consumer Hardware)

For more control, use llama.cpp with quantized GGUF files. Community-made quantizations are available on Hugging Face:

Step 1: Install llama.cpp ```bash brew install llama.cpp ``` Step 2: Download a quantized model

Pick your quantization based on your RAM:

QuantizationSizeRAM NeededSpeed Q3_K (4-bit)~101 GB128GB~20-25 tok/s Q6_K (6-bit)~150 GB192GB~15 tok/s Q8_0 (8-bit)~243 GB256GB~10 tok/s

Download from the Unsloth GGUF repo or community repos.

Step 3: Run it ```bash llama-cli --model path/to/minimax-m2.5.gguf \ --jinja --temp 1.0 --top-p 0.95 --top-k 40 \ --ctx-size 16384 ```

That's it. You now have a frontier-class AI running locally with zero ongoing costs.

Option 3: vLLM (Best for Production/Serving)

If you're setting up an API server for a team or product:

```bash pip install vllm

vllm serve MiniMaxAI/MiniMax-M2.5 \ --tensor-parallel-size 4 \ --tool-call-parser minimax_m2 \ --reasoning-parser minimax_m2_append_think \ --enable-auto-tool-choice \ --trust-remote-code ```

This gives you an OpenAI-compatible API endpoint running locally. Point any application that uses the OpenAI SDK at `localhost:8000` and swap in M2.5 without changing a line of code.

Option 4: Use the API (Cheapest Cloud Option)

If you don't want to run it locally, MiniMax's official API is absurdly cheap:

- Standard: $0.15 input / $1.20 output per million tokens - Lightning: $0.30 input / $2.40 output per million tokens

For reference, Claude Opus 4.6 costs $15/$75 per million tokens. That makes M2.5 roughly 95% cheaper for comparable coding performance.

Sign up at platform.minimax.io or use it through OpenRouter.

---

The Mac Mini Sweet Spot

Here's why this matters for Mac users specifically. Apple Silicon's unified memory architecture means the CPU and GPU share the same RAM pool. A Mac Mini with 64GB of unified memory can load the 4-bit quantized M2.5 model entirely into memory and run inference on the GPU — no discrete graphics card needed.

A Mac Mini M4 Pro with 64GB RAM costs about $2,000. That's your entire AI infrastructure. No cloud bills. No API rate limits. No data leaving your machine. Run it 24/7 as a local AI server for your whole house or small team.

For developers who want a dedicated AI coding assistant: set up vLLM on the Mac Mini, point your IDE's AI plugin at `localhost:8000`, and you have a private Claude-tier coding assistant running in your closet.

---

What to Watch Out For

The license has a catch. MiniMax-M2.5 uses a Modified MIT License. It's permissive — you can fine-tune, self-host, and deploy commercially. But if you use it in a commercial product, you must prominently display "MiniMax M2.5" on the user interface. That's more restrictive than Apache 2.0 but more permissive than Meta's Llama license. Benchmark skepticism is fair. MiniMax's earlier M2 and M2.1 releases were flagged by the community for potential benchmark optimization. Real-world testing should validate the numbers. That said, OpenHands (an independent open-source coding agent project) confirmed M2.5 matches Claude Sonnet-tier performance on their own evaluations. Math and reasoning lag behind. M2.5 scored 86.3 on AIME25 versus 95.6 for Claude Opus. If you need heavy mathematical reasoning, the proprietary models still win. M2.5's strengths are concentrated in coding, tool use, and agentic workflows.

---

The Bottom Line

MiniMax-M2.5 is the first open-weights model that genuinely competes with frontier proprietary models on the tasks developers use most. You can run it on hardware you might already own, for zero ongoing cost, with complete data privacy.

The setup takes about 10 minutes. The model weights are free. The only cost is the hardware — and if you already have a Mac with 64GB+ RAM, even that's covered.

Download: Hugging Face Code: GitHub

---

Related Reading

- Mistral Releases a Free, Open-Source GPT-4 Competitor. It's Actually Good. - I Replaced My Digital Life with an Open-Source AI: A Hard Fork Experiment - 25 Real OpenClaw Automations That Are Actually Working: From Inbox Zero to AI Chief of Staff