Galileo
LLM evaluation platform with evaluation models that run in under 200ms — fast enough to use as production guardrails, not just offline eval. Covers hallucination detection, RAG quality, and safety scoring. Distinct from Galileo AI (the UI design tool).
PromptFoo
Test and compare prompts across models. Built-in red-teaming, regression testing, and side-by-side model comparison.