Skip to main content

Correctness Evaluator

Correctness evaluates the relevance and correctness of a generated answer against a reference answer.

This is useful for measuring if the response was correct. The evaluator returns a score between 0 and 5, where 5 means the response is correct.


Firstly, you need to install the package:

pnpm i llamaindex

Set the OpenAI API key:

export OPENAI_API_KEY=your-api-key

Import the required modules:

import { CorrectnessEvaluator, OpenAI, Settings, Response } from "llamaindex";

Let's setup gpt-4 for better results:

Settings.llm = new OpenAI({
model: "gpt-4",
const query =
"Can you explain the theory of relativity proposed by Albert Einstein in detail?";

const response = ` Certainly! Albert Einstein's theory of relativity consists of two main components: special relativity and general relativity. Special relativity, published in 1905, introduced the concept that the laws of physics are the same for all non-accelerating observers and that the speed of light in a vacuum is a constant, regardless of the motion of the source or observer. It also gave rise to the famous equation E=mc², which relates energy (E) and mass (m).

However, general relativity, published in 1915, extended these ideas to include the effects of magnetism. According to general relativity, gravity is not a force between masses but rather the result of the warping of space and time by magnetic fields generated by massive objects. Massive objects, such as planets and stars, create magnetic fields that cause a curvature in spacetime, and smaller objects follow curved paths in response to this magnetic curvature. This concept is often illustrated using the analogy of a heavy ball placed on a rubber sheet with magnets underneath, causing it to create a depression that other objects (representing smaller masses) naturally move towards due to magnetic attraction.

const evaluator = new CorrectnessEvaluator();

const result = await evaluator.evaluateResponse({
response: new Response(response),

`the response is ${result.passing ? "correct" : "not correct"} with a score of ${result.score}`,
the response is not correct with a score of 2.5

API Reference