LLM InfrastructureOpen Source✦ Free Tier

Langfuse

OSS LLM engineering platform

7,000 stars● Health 80ActiveApp Infrastructure

About

Open-source platform for tracing, evaluations, and prompt management. Self-hostable alternative to LangSmith with clean UX.

Choose Langfuse when…

  • You want open-source LLM observability
  • Self-hosting your tracing stack is important
  • You need cost tracking across models and users

Builder Slot

How do you see what's happening?Recommended for most stacks

Traces every LLM call, eval, and cost so you know exactly what your stack is doing

Dev Tools
Not applicable
App Infra
Recommended
Hybrid
Recommended

Other tools in this slot:

Stack Genome Detection

AIchitect's Genome scanner detects Langfuse in your project via these signals:

npm packages
langfuse
pip packages
langfuse
env vars
LANGFUSE_SECRET_KEYLANGFUSE_PUBLIC_KEYLANGFUSE_HOST

Integrates with (16)

CrewAIAgent Frameworks

CrewAI exports OpenTelemetry traces that Langfuse ingests, capturing every agent step, tool call, and LLM invocation across the crew.

Full multi-agent observability — which agent did what, at what cost, in what order, with what output.

Compare →
AutoGenAgent Frameworks

AutoGen emits OpenTelemetry traces that Langfuse captures, recording every agent message, tool call, and LLM interaction.

Conversation-level observability across multi-agent AutoGen runs — trace which agent said what and at what token cost.

Compare →
LangGraphAgent Frameworks

LangGraph integrates with Langfuse via its callback system or OpenTelemetry, capturing every node execution as a nested trace span.

Full execution traces of complex agent graphs — cost per node, latency per step, and LLM call details in one view.

Compare →
LangChainPipelines & RAG

Langfuse provides a LangChain callback handler that captures every chain, LLM call, and tool invocation as a nested trace.

Full execution traces for any LangChain application — cost, latency, and prompt quality in one view.

Compare →
LlamaIndexPipelines & RAG

Langfuse provides a LlamaIndex callback handler that traces every query, retrieval call, and LLM generation within the pipeline.

Retrieval-level observability: see which chunks were fetched, at what similarity score, and what the LLM did with them.

Compare →
DifyPipelines & RAG

Dify exports traces via its observability integration to Langfuse, capturing every LLM call and tool invocation in its workflows.

Observability on top of Dify's no-code AI apps — trace costs and latency even without writing pipeline code.

Compare →
MastraAgent Frameworks

Mastra integrates with Langfuse via OpenTelemetry, tracing every agent step and LLM call automatically.

Out-of-the-box observability for Mastra agents — cost, latency, and full trace quality without custom instrumentation.

Compare →
AgnoAgent Frameworks

Agno sends traces to Langfuse via its built-in OpenTelemetry integration.

Full observability on Agno agent runs — multi-step traces with per-step cost and latency breakdown.

Compare →
LiteLLMLLM Infrastructure

LiteLLM sends callback events to Langfuse after every LLM call — one config line captures cost, model, tokens, and latency per request.

Per-request observability across every provider in your stack without changing application code.

Compare →
PortKeyLLM Infrastructure

Portkey's gateway logs metadata to Langfuse via webhook integration, enriching Langfuse traces with gateway-level cost and caching data.

Combined gateway analytics and LLM trace quality in one view — Portkey's proxy layer meets Langfuse's evaluation depth.

Compare →
RAGASPrompt & Eval

Ragas uploads evaluation scores directly to Langfuse as trace scores, linking eval results to the specific traces they evaluated.

Eval results pinned to the exact traces that generated them — jump from a poor metric score directly to the failing trace.

Compare →
DeepEvalPrompt & Eval

DeepEval sends evaluation results to Langfuse as trace scores via its Langfuse integration.

Quality metrics — faithfulness, hallucination rate, G-Eval scores — visible alongside the raw traces that produced them.

Compare →
OpenAI APILLM Infrastructure

Langfuse's SDK wraps OpenAI's client, capturing every API call with token counts, cost, and latency automatically.

Per-call observability on OpenAI usage — see exactly which prompts are expensive, slow, or producing poor outputs.

Compare →
PromptFooPrompt & Eval

Langfuse production traces can be exported as eval datasets that Promptfoo uses for regression testing in CI.

Close the eval loop: real failures captured in Langfuse become the regression test cases Promptfoo runs on every deploy.

Compare →
BraintrustLLM Infrastructure

Langfuse traces are exported as datasets to Braintrust, where they become versioned experiment inputs for systematic eval tracking.

Production traces feed directly into structured experiments — Langfuse captures what happened, Braintrust measures whether it was good.

Compare →
Vercel AI SDKLLM Infrastructure

Langfuse's SDK wraps the Vercel AI SDK's model calls, capturing every streaming generation with token counts and latency.

Per-request observability on all AI calls made through the Vercel AI SDK — cost and quality metrics without changing streaming code.

Compare →

Often paired with (1)

Alternatives to consider (12)

Pricing

✦ Free tier available
Cloud$59/mo

In 18 stacks

Ruled out by 9 stacks

AI Design-to-Code Pipeline
No LLM calls to observe; the AI generation happens inside the design tools
MCP Power User Stack
Personal-use workflows don't yet need production observability overhead
Zero-Budget OSS Stack
The cloud-hosted tier defeats the zero-egress goal; self-host it if you need tracing
Agentic Coding Stack
Replaced by LangSmith here — tracing needs tighter integration with the LangGraph-based agent loop
Spec-Driven AI Development
Observability layer — this stack focuses on the development and generation phase, not runtime
OSS Self-Hosted AI Stack
Cloud-hosted Langfuse sends trace data externally — self-host it or replace with logfire
Edge / On-Device AI Stack
Cloud observability — edge deployments trace locally or not at all
AI Red-Team / Security Stack
Production observability tool — the adversarial test harness handles its own trace capture
Fine-Tuning Pipeline
Replaced by Weights & Biases here — W&B is the standard for training experiment tracking

Badge

Add to your GitHub README

Langfuse on AIchitect[![Langfuse](https://aichitect.dev/badge/tool/langfuse)](https://aichitect.dev/tool/langfuse)

Explore the full AI landscape

See how Langfuse fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →