Skip to main contentMerge’s Evaluation Suite enables you to test a set of prompts or hero queries for a given connector to evaluate the quality different tools within a connector.
There are two supported evaluator types right now:
- Reference tool call match: Verifies the tool calls produced by a model match the reference tool calls defined in the evals.
- Label model evaluator: Compares model output to a set of allowed labels, and marks pass/fail based on passing labels.
How to run Evals?
Step 1: Select a connector
Select a connector you’d like to test. Once you select a connector, you’ll be presented with an option to select an evaluation suite containing pre-configured evals built for the connector by Merge. If you’d like to use your own set of evals, you can select “Your organization’s saved evaluations”.
Select a Test User that has already been authenticated for the given connector, and a Tool Pack that contains the connector and specific tools you’d like to test.
Step 3: Adding an evaluator
There are two different evaluatiors Merge supports out of the box - a reference tool call match and a label model evaluator.
Select ”+ Add Evaluator” if you’d like to run the evals with one of our evaluators.
Step 4: Running and analyzing results
Once you’ve loaded the set of evaluations / prompts, selected a Test User and Tool Pack, and added an evaluator, you are set to run the evaluations.