Inspect
Inspect is an open-source framework for building LLM evaluations, developed by the UK AI Safety Institute. It provides task composition, built-in datasets, scorers, and solvers for systematic benchmarking of LLM capabilities, safety, and alignment properties.
DeepEval
Open-source evaluation framework with 14+ metrics including faithfulness, relevancy, and hallucination detection. Integrates with CI/CD.