PromptFoo
Test and compare prompts across models. Built-in red-teaming, regression testing, and side-by-side model comparison.
Galileo
LLM evaluation platform with evaluation models that run in under 200ms — fast enough to use as production guardrails, not just offline eval. Covers hallucination detection, RAG quality, and safety scoring. Distinct from Galileo AI (the UI design tool).