LLM InfrastructureOpen Source✦ Free Tier

llama.cpp

C++ LLM inference for local and edge deployment

68,000 stars● Health 80ActiveDev Productivity & App Infrastructure

About

Highly optimized C++ inference engine for running quantized LLMs on CPU and GPU. The foundation for Ollama and many local AI tools.

Choose llama.cpp when…

  • You want maximum efficiency for local LLM inference
  • You're running models on CPU or edge hardware
  • Quantized model performance is your optimization target

Builder Slot

Where do your models actually run?Required for most stacks

LLM providers and inference servers — where the actual model computation happens

Dev Tools
Not applicable
App Infra
Required
Hybrid
Required

Other tools in this slot:

Stack Genome Detection

AIchitect's Genome scanner detects llama.cpp in your project via these signals:

pip packages
llama-cpp-python
config files
Modelfile

Integrates with (1)

RunPodLLM Infrastructure
Compare →

Often paired with (1)

Pricing

✦ Free tier available

Ruled out by 1 stack

Edge / On-Device AI Stack
Ollama already uses llama.cpp under the hood — listing both creates redundancy without adding value.

Badge

Add to your GitHub README

llama.cpp on AIchitect[![llama.cpp](https://aichitect.dev/badge/tool/llama-cpp)](https://aichitect.dev/tool/llama-cpp)

Explore the full AI landscape

See how llama.cpp fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →