Evaluation Suite

Merge’s Evaluation Suite enables you to test a set of prompts or hero queries for a given connector to evaluate the quality different tools within a connector. There are two supported evaluator types right now:

Reference tool call match: Verifies the tool calls produced by a model match the reference tool calls defined in the evals.
Label model evaluator: Compares model output to a set of allowed labels, and marks pass/fail based on passing labels.

How to run Evals?

Step 1: Select a connector

Select a connector you’d like to test. Once you select a connector, you’ll be presented with an option to select an evaluation suite containing pre-configured evals built for the connector by Merge. If you’d like to use your own set of evals, you can select “Your organization’s saved evaluations”.

Step 2: Select a Test User and Tool Pack

Select a Test User that has already been authenticated for the given connector, and a Tool Pack that contains the connector and specific tools you’d like to test.

Step 3: Adding an evaluator

There are two different evaluatiors Merge supports out of the box - a reference tool call match and a label model evaluator. Select ”+ Add Evaluator” if you’d like to run the evals with one of our evaluators.

Step 4: Running and analyzing results

Once you’ve loaded the set of evaluations / prompts, selected a Test User and Tool Pack, and added an evaluator, you are set to run the evaluations.

Implementation Guides

Admin Setup

Testing

How to run Evals?

Step 1: Select a connector

Step 2: Select a Test User and Tool Pack

Step 3: Adding an evaluator

Step 4: Running and analyzing results

Implementation Guides

Admin Setup

Testing

​How to run Evals?

​Step 1: Select a connector

​Step 2: Select a Test User and Tool Pack

​Step 3: Adding an evaluator

​Step 4: Running and analyzing results

How to run Evals?

Step 1: Select a connector

Step 2: Select a Test User and Tool Pack

Step 3: Adding an evaluator

Step 4: Running and analyzing results