Compress Model
Select model, set quality target, and let Atlas handle everything.
Compress any LLM to fit consumer hardware with near-zero quality loss. No manual tuning. No degradation. Just works.
Four-stage pipeline. Fully automatic. Quality guaranteed.
Auto-detect hardware. Analyze each layer's sensitivity via gradient + activation magnitude.
Allocate 2-6 bits per layer based on sensitivity. Fit within your hardware budget.
LoRA distillation from FP16 teacher re-injects lost knowledge into weak layers.
Benchmark against FP16 baseline. If below threshold, loop back. Never ships degraded.
Every other tool makes you choose. Atlas decides for you.
Auto-detects RAM, GPU, compute. Plans compression to fit exactly, with room for KV cache and activations.
Sensitive layers get 5-6 bits. Robust layers get 2-3. Three quantization methods working together per model.
LoRA distillation from FP16 teacher recovers degraded layers. Iterates until quality threshold met.
Benchmark suite runs automatically. PPL, MMLU, ARC, HellaSwag. Never delivers sub-threshold models.
No method selection. No bit-width guessing. No manual benchmarking. Point at model, set quality target, go.
Output in MLX for Apple Silicon or GGUF for cross-platform. Works with any inference runtime.
Side by side with existing quantization tools.
| Feature | Atlas | llama.cpp | AutoAWQ | AutoGPTQ |
|---|---|---|---|---|
| Auto bit allocation | Per-layer adaptive | Manual | Uniform | Uniform |
| Quality recovery | LoRA distillation | None | None | None |
| Quality guarantee | <1% verified | No | No | No |
| Hardware-aware | Auto-detect + fit | No | No | No |
| PPL drop (70B, 3.5bit) | ~0.7% | ~2-3% | ~1-2% | ~1-2.5% |
| End-to-end | One command | Multi-step | Script | Script |
Let Atlas handle the compression. You handle the prompts.
Select model, set quality target, and let Atlas handle everything.