CogniSwitch Engineering Guide v2.1

Criteria Authoring
Workbench.

Transforming SOPs from subjective documents into executable logic. A practical manual for writing audit-ready criteria.

Part 01

Why Criteria Fail

Your agent works. Sort of. It handles the happy path. It sounds professional. Demo goes great. But when you deploy to production, automation rates stall at 15-20%. Customer complaints trickle in. Your engineering team asks: "What exactly should we fix?"

You don't know. Because your quality audits aren't telling you anything useful. The root cause isn't your agent. It's your criteria.

The Hidden Cost of Vague Criteria

Broken Criterion

"Clear explanation provided"

ERROR: What's 'clear' to one reviewer isn't to another.
Audit-Ready Refactor
// OUTPUT
Agent confirms customer understanding by asking 'Do you have any questions?'
Status: Subjective DETECTEDFix Applied: TRUE

The Courtroom Test

If your criterion would make sense in a courtroom deposition but feels strange in a normal conversation, you've written it wrong.

Wrong (Robotic)

"State your full legal name for the record."

Criteria should allow for natural flexibility. Outcome over action.

Right (Natural)

"And I have you as John Doe, correct?"

Part 02

Grouping Strategy

Before you write a single criterion, decide how you'll organize them. This isn't about neatness. It's about diagnosis. When your agent fails 23 interactions, knowing they all failed in "Clinical Data Collection" tells you exactly which LLM chain to debug.

Admin

The mechanical stuff. Identity verification, call setup, consent. Low complexity, high compliance.

Verify caller identity
State agent name

Core Process

The actual job. Clinical assessment, lending qualification. Where domain expertise lives.

Gather symptoms
Assess creditworthiness

Empathy / CX

The human layer. Active listening, tone adjustment. Hardest to quantify.

Acknowledge frustration
Allow patient to speak

Compliance

Mandatory disclosures. Verbatim requirements. Regulatory boxes that must be checked.

Read privacy disclosure
Obtain recorded consent
Part 02B

The Granularity Trap

Each addition feels justified. "But what if the customer spells their name wrong?" "What if they give a nickname?"

Stop. The 80/20 Rule applies here. Does this edge case occur in more than 5% of interactions? If not, let it fail the primary criterion.

Primary Requirement
"Verify Customer Identity"
Audit Status
OPTIMAL
01
Primary Criterion
Pass: Agent asks for customer's full name.
Use controls to adjust depth:

The 3-Deep Rule

"For any single SOP requirement, you get one primary criterion and up to two conditionals. That's it."

  • 1 Primary Criterion (Core)
  • Max 2 Conditional Criteria
  • Any more is over-engineering
Part 03

The Transformation Method

You have an SOP written for humans. You need criteria machines can evaluate. Here is the algorithm to get from paragraphs to atomic logic.

Transformation_Engine

SOP TO CRITERIA COMPILER v2.1

1/5
CURRENT PHASE

Decompose

Read the SOP aloud. Every time you hear 'and' or a comma, that's likely a split point.

Original SOP Requirement
"The agent should gather the patient's medical history, including current medications, known allergies, and any existing conditions, before discussing treatment options."
Atomic Requirements
root@audit:~#1. Current medications gathered 2. Known allergies gathered 3. Existing conditions gathered 4. All gathered before treatment discussion (sequence)

Observable Moments

The Transcript Test: If you can't Ctrl+F for evidence, the criterion isn't observable.

BAD: "Agent was friendly."
GOOD: "Agent used greeting with patient's name."

Strong Fail Definitions

The fail definition isn't just "the opposite of pass." It must cover partial completion.

FAIL: Interaction ends without [X] stated or explicitly confirmed as none.
Part 04

Quantification

Quantified criteria remove the last traces of subjectivity. There is no debate about whether 94% is less than 95%.

Use the Baseline Method: Don't guess thresholds. Run 20 calls, plot the data, and find the natural gap between "clearly bad" and "clearly good".

Ratio-Based

Measures one thing relative to another.

Active Listening
Provider speaks < 95% of total words.
Balanced Conversation
Provider talk time between 40-60%.
Confirmation Coverage
At least 80% of data points confirmed.

Count-Based

Measures occurrences of specific behaviors.

Use Customer Name
Used at least 2 times during interaction.
Offer Assistance
"Anything else?" asked at least once.
Excessive Re-asking
Same info asked not more than 2 times.

Time-Based

Measures when something happens or duration.

Prompt Greeting
Delivered within first 10 seconds.
Hold Time
Total hold time < 60 seconds.
Verification Timing
Confirmed before any account details discussed.
Part 04B: Inversion Principle

The Default State Problem

Checking for "Empathy" often yields 98% pass rates because professional behavior is the baseline. This is noise. To find the signal (the 2% of bad calls), flip the criterion to hunt for "Rudeness."

Signal_Monitor_v2.0
Criterion: "Was Empathy Present?"
RESULT: 97% PASS (Low Signal)

The Default State Problem

You run a criterion against 500 interactions. 487 Pass, 13 Fail. Your report says "97.4% Empathy Rate".

You spent tokens evaluating 487 interactions that were... fine. Normal. Default professional behavior.

RULE: The 13 failures are where the signal lives. Everything else is noise.
Old Way
"Was empathy present?"
PASS: 487 Calls
New Way
"Was rudeness present?"
FAIL: 13 Calls (Flagged for Review)
Part 05

Logic Gates

You write 25 criteria for "Debt Collection." But 8 only apply to employed borrowers, and 6 only apply to hardship cases.

If you evaluate a hardship call against employed criteria (e.g., "Set Payment Date"), your agent fails. These are False Failures. They destroy trust in your audit data.

The False Failure Simulator

Toggle the switch below to see how Logic Gates filter out irrelevant criteria and fix audit scores.

Simulation Context
Call #8291AUDIO
Customer: John Doe
Status: Unemployed
"I lost my job last month. I can't make any payments right now."
Evaluation Logic

System evaluates ALL criteria blindly. Agent is penalized for not asking a jobless person for money.

Audit Score
71%
Status
ACCURATE
ID
Criterion
Type
Result
01
Verify Customer Identity
UNIVERSAL
PASS
02
Read Mini-Miranda Disclosure
UNIVERSAL
PASS
03
Establish Employment StatusGATE: Unemployed
GATE
PASS
04
Propose Payment Plan
PATH A
FAIL
05
Secure Payment Commitment
PATH A
FAIL
06
Explain Hardship Options
PATH B
PASS
07
Email Hardship Forms
PATH B
PASS
Analysis

The agent followed the correct "Hardship" procedure, but is failing because "Employed" criteria (04, 05) are being blindly evaluated. This is noise.

Part 06

Common Pitfalls

A quick reference checklist. If you see these patterns, refactor immediately.

Select a Pattern
Diagnosing

The 'Appropriately' Trap

Symptom

"Your criterion uses words like 'appropriately', 'properly', 'correctly', 'adequately'."

Incorrect
Provider appropriately addresses patient concerns.
Corrected
Provider acknowledges stated concern AND offers a response (information, action, or follow-up).
The Fix

Replace the judgment word with the observable behavior that defines 'appropriate'.

What's Next?

Put Your Criteria to Work

You've written audit-ready criteria. Now enforce them in real-time and verify compliance at scale.

CogniSwitch

Fix the truth before the token.