Loading...
Discovering amazing AI tools


Open-source toolkit to instrument, evaluate, and track LLM applications with feedback functions and dashboard-driven comparisons.

Open-source toolkit to instrument, evaluate, and track LLM applications with feedback functions and dashboard-driven comparisons.
TruLens (TruLens Eval / trulens) is an open-source toolkit for instrumenting, evaluating, and monitoring large language model (LLM) applications. It provides fine-grained, stack-agnostic instrumentation to record model calls, retrievals, prompts, and knowledge sources, and runs configurable feedback functions alongside application runs to surface failure modes such as hallucinations or factual errors. TruLens includes utilities for virtual records, RAG-centric evaluation (the RAG Triad), a web UI/dashboard to compare app versions and leaderboards, and integrations for multiple model providers and observability systems (including planned OpenTelemetry work). Its value is in turning ad-hoc “vibe checks” into systematic, repeatable evaluations that let teams iterate on prompts, retrievers, and model choices with measurable feedback.



