Skip to main content
Merge’s Evaluation Suite enables you to test a set of prompts or hero queries for a given connector to evaluate the quality different tools within a connector. There are two supported evaluator types right now:
  1. Reference tool call match: Verifies the tool calls produced by a model match the reference tool calls defined in the evals.
  2. Label model evaluator: Compares model output to a set of allowed labels, and marks pass/fail based on passing labels.

How to run Evals?

Step 1: Select a connector

Select a connector you’d like to test. Once you select a connector, you’ll be presented with an option to select an evaluation suite containing pre-configured evals built for the connector by Merge. If you’d like to use your own set of evals, you can select “Your organization’s saved evaluations”. select-connector-evals.png

Step 2: Select a Test User and Tool Pack

Select a Test User that has already been authenticated for the given connector, and a Tool Pack that contains the connector and specific tools you’d like to test.

Step 3: Adding an evaluator

There are two different evaluatiors Merge supports out of the box - a reference tool call match and a label model evaluator. Select ”+ Add Evaluator” if you’d like to run the evals with one of our evaluators. add-evaluator.png

Step 4: Running and analyzing results

Once you’ve loaded the set of evaluations / prompts, selected a Test User and Tool Pack, and added an evaluator, you are set to run the evaluations.