LLM InfrastructureOpen Source✦ Free Tier

Langfuse

OSS LLM engineering platform

⭐ 7,000 stars● Health 80 — ActiveApp Infrastructure

About

Open-source platform for tracing, evaluations, and prompt management. Self-hostable alternative to LangSmith with clean UX.

Choose Langfuse when…

•You want open-source LLM observability
•Self-hosting your tracing stack is important
•You need cost tracking across models and users

Builder Slot

How do you see what's happening?Recommended for most stacks

Traces every LLM call, eval, and cost so you know exactly what your stack is doing

Dev Tools

Not applicable

App Infra

Recommended

Hybrid

Recommended

Other tools in this slot:

LangSmith Helicone Arize Phoenix Braintrust Weights & Biases Traceloop Logfire Opik +2 more

Stack Genome Detection

AIchitect's Genome scanner detects Langfuse in your project via these signals:

npm packages

langfuse

pip packages

langfuse

env vars

LANGFUSE_SECRET_KEYLANGFUSE_PUBLIC_KEYLANGFUSE_HOST

Integrates with (16)

CrewAIAgent Frameworks

CrewAI exports OpenTelemetry traces that Langfuse ingests, capturing every agent step, tool call, and LLM invocation across the crew.

→ Full multi-agent observability — which agent did what, at what cost, in what order, with what output.

Compare →

AutoGenAgent Frameworks

AutoGen emits OpenTelemetry traces that Langfuse captures, recording every agent message, tool call, and LLM interaction.

→ Conversation-level observability across multi-agent AutoGen runs — trace which agent said what and at what token cost.

Compare →

LangGraphAgent Frameworks

LangGraph integrates with Langfuse via its callback system or OpenTelemetry, capturing every node execution as a nested trace span.

→ Full execution traces of complex agent graphs — cost per node, latency per step, and LLM call details in one view.

Compare →

LangChainPipelines & RAG

Langfuse provides a LangChain callback handler that captures every chain, LLM call, and tool invocation as a nested trace.

→ Full execution traces for any LangChain application — cost, latency, and prompt quality in one view.

Compare →

LlamaIndexPipelines & RAG

Langfuse provides a LlamaIndex callback handler that traces every query, retrieval call, and LLM generation within the pipeline.

→ Retrieval-level observability: see which chunks were fetched, at what similarity score, and what the LLM did with them.

Compare →

DifyPipelines & RAG

Dify exports traces via its observability integration to Langfuse, capturing every LLM call and tool invocation in its workflows.

→ Observability on top of Dify's no-code AI apps — trace costs and latency even without writing pipeline code.

Compare →

MastraAgent Frameworks

Mastra integrates with Langfuse via OpenTelemetry, tracing every agent step and LLM call automatically.

→ Out-of-the-box observability for Mastra agents — cost, latency, and full trace quality without custom instrumentation.

Compare →

AgnoAgent Frameworks

Agno sends traces to Langfuse via its built-in OpenTelemetry integration.

→ Full observability on Agno agent runs — multi-step traces with per-step cost and latency breakdown.

Compare →

LiteLLMLLM Infrastructure

LiteLLM sends callback events to Langfuse after every LLM call — one config line captures cost, model, tokens, and latency per request.

→ Per-request observability across every provider in your stack without changing application code.

Compare →

PortKeyLLM Infrastructure

Portkey's gateway logs metadata to Langfuse via webhook integration, enriching Langfuse traces with gateway-level cost and caching data.

→ Combined gateway analytics and LLM trace quality in one view — Portkey's proxy layer meets Langfuse's evaluation depth.

Compare →

RAGASPrompt & Eval

Ragas uploads evaluation scores directly to Langfuse as trace scores, linking eval results to the specific traces they evaluated.

→ Eval results pinned to the exact traces that generated them — jump from a poor metric score directly to the failing trace.

Compare →

DeepEvalPrompt & Eval

DeepEval sends evaluation results to Langfuse as trace scores via its Langfuse integration.

→ Quality metrics — faithfulness, hallucination rate, G-Eval scores — visible alongside the raw traces that produced them.

Compare →

OpenAI APILLM Infrastructure

Langfuse's SDK wraps OpenAI's client, capturing every API call with token counts, cost, and latency automatically.

→ Per-call observability on OpenAI usage — see exactly which prompts are expensive, slow, or producing poor outputs.

Compare →

PromptFooPrompt & Eval

Langfuse production traces can be exported as eval datasets that Promptfoo uses for regression testing in CI.

→ Close the eval loop: real failures captured in Langfuse become the regression test cases Promptfoo runs on every deploy.

Compare →

BraintrustLLM Infrastructure

Langfuse traces are exported as datasets to Braintrust, where they become versioned experiment inputs for systematic eval tracking.

→ Production traces feed directly into structured experiments — Langfuse captures what happened, Braintrust measures whether it was good.

Compare →

Vercel AI SDKLLM Infrastructure

Langfuse's SDK wraps the Vercel AI SDK's model calls, capturing every streaming generation with token counts and latency.

→ Per-request observability on all AI calls made through the Vercel AI SDK — cost and quality metrics without changing streaming code.

Compare →

Often paired with (1)

Cursor

Alternatives to consider (12)

LangSmithcompare →Heliconecompare →Arize Phoenixcompare →Braintrustcompare →PromptLayercompare →Agentacompare →Weights & Biasescompare →Traceloopcompare →Logfirecompare →Opikcompare →MLflowcompare →AgentOpscompare →

Pricing

✦ Free tier available

Cloud$59/mo

In 18 stacks

Indie Hacker / Startup Stack No-Code AI Automation Stack TypeScript-Only AI Stack Agentic Coding Stack Browser AI / Web Agent Stack Voice AI Pipeline Data + AI Pipeline LLM Production Infra Stack Evaluation & Quality Stack Spec-Driven AI Development LLM Cost Reduction Stack Multi-Modal RAG Stack Legacy App + AI Stack OSS Self-Hosted AI Stack EU / GDPR Regulated AI Stack AI Red-Team / Security Stack Document Intelligence Stack Research & Synthesis Stack

Ruled out by 9 stacks

AI Design-to-Code Pipeline

“No LLM calls to observe; the AI generation happens inside the design tools”

MCP Power User Stack

“Personal-use workflows don't yet need production observability overhead”

Zero-Budget OSS Stack

“The cloud-hosted tier defeats the zero-egress goal; self-host it if you need tracing”

Agentic Coding Stack

“Replaced by LangSmith here — tracing needs tighter integration with the LangGraph-based agent loop”

Spec-Driven AI Development

“Observability layer — this stack focuses on the development and generation phase, not runtime”

OSS Self-Hosted AI Stack

“Cloud-hosted Langfuse sends trace data externally — self-host it or replace with logfire”

Edge / On-Device AI Stack

“Cloud observability — edge deployments trace locally or not at all”

AI Red-Team / Security Stack

“Production observability tool — the adversarial test harness handles its own trace capture”

Fine-Tuning Pipeline

“Replaced by Weights & Biases here — W&B is the standard for training experiment tracking”

Badge

Add to your GitHub README

[![Langfuse](https://aichitect.dev/badge/tool/langfuse)](https://aichitect.dev/tool/langfuse)

Explore the full AI landscape

See how Langfuse fits into the bigger picture — browse all 207 tools and their relationships.

Explore graph →