Adaptive Recovery Quantization

Run 70B models
on your MacBook.

Compress any LLM to fit consumer hardware with near-zero quality loss. No manual tuning. No degradation. Just works.

Terminal
$ atlas compress meta-llama/Llama-3.3-70B --target macbook-air-16gb --quality 99
[profile] Detected: Apple M1, 16GB RAM, 8-core GPU
[plan] Mixed-bit allocation: 2-5 bit across 80 layers
[recover] LoRA recovery on 12 weak layers... done
[verify] PPL drop: 0.7% | MMLU drop: 0.3% | Size: 28.5GB → 9.2GB
3.4
Avg bits/weight
<1%
Quality loss
68%
Size reduction
8.4
tok/s on Air M1

How Atlas Works

Four-stage pipeline. Fully automatic. Quality guaranteed.

01

Profile

Auto-detect hardware. Analyze each layer's sensitivity via gradient + activation magnitude.

02

Plan

Allocate 2-6 bits per layer based on sensitivity. Fit within your hardware budget.

03

Recover

LoRA distillation from FP16 teacher re-injects lost knowledge into weak layers.

04

Verify

Benchmark against FP16 baseline. If below threshold, loop back. Never ships degraded.

Why Atlas

Every other tool makes you choose. Atlas decides for you.

Hardware-Aware

Fits Your Machine

Auto-detects RAM, GPU, compute. Plans compression to fit exactly, with room for KV cache and activations.

Mixed Precision

Smart Bit Allocation

Sensitive layers get 5-6 bits. Robust layers get 2-3. Three quantization methods working together per model.

Recovery Loop

Zero Quality Loss

LoRA distillation from FP16 teacher recovers degraded layers. Iterates until quality threshold met.

Verified Output

Guaranteed Quality

Benchmark suite runs automatically. PPL, MMLU, ARC, HellaSwag. Never delivers sub-threshold models.

One Command

End-to-End

No method selection. No bit-width guessing. No manual benchmarking. Point at model, set quality target, go.

Open Format

MLX + GGUF

Output in MLX for Apple Silicon or GGUF for cross-platform. Works with any inference runtime.

Atlas vs Everything Else

Side by side with existing quantization tools.

Feature Atlas llama.cpp AutoAWQ AutoGPTQ
Auto bit allocation Per-layer adaptive Manual Uniform Uniform
Quality recovery LoRA distillation None None None
Quality guarantee <1% verified No No No
Hardware-aware Auto-detect + fit No No No
PPL drop (70B, 3.5bit) ~0.7% ~2-3% ~1-2% ~1-2.5%
End-to-end One command Multi-step Script Script

Stop guessing bit widths.

Let Atlas handle the compression. You handle the prompts.

pip install atlas-compress

Compress Model

Select model, set quality target, and let Atlas handle everything.

Size: 140 GB (FP16) Layers: 80 Params: 70.6B
99%
Aggressive (smaller) Lossless (bigger)

Compression Progress

Idle
Overall 0%
1
Profile
2
Plan
3
Recover
4
Verify
Layer quantization plan
L0-7
5-bit
L8-31
4-bit
L32-59
3-bit
L60-79
2-bit
Estimated Size
9.2 GB
-67.7% from FP16
Avg Bitrate
3.4 bit
Mixed AWQ+HQQ+AQLM
Est. Throughput
8.4 tok/s
on MacBook Air M1
ETA
--:--
Not started