- Reference tool call match: Verifies the tool calls produced by a model match the reference tool calls defined in the evals.
- Label model evaluator: Compares model output to a set of allowed labels, and marks pass/fail based on passing labels.
How to run Evals?
Step 1: Select a connector
Select a connector you’d like to test. Once you select a connector, you’ll be presented with an option to select an evaluation suite containing pre-configured evals built for the connector by Merge. If you’d like to use your own set of evals, you can select “Your organization’s saved evaluations”.
Step 2: Select a Test User and Tool Pack
Select a Test User that has already been authenticated for the given connector, and a Tool Pack that contains the connector and specific tools you’d like to test.Step 3: Adding an evaluator
There are two different evaluatiors Merge supports out of the box - a reference tool call match and a label model evaluator. Select ”+ Add Evaluator” if you’d like to run the evals with one of our evaluators.