17/17 cases passed.
Ran 2026-05-18T05:21:37Z · agent gpt-5.4 ·
judge gpt-5.5 · 462.1s.
Seventeen scripted conversations test the chatbot — six clean SMEs, seven regression guards, four edge cases (illegal businesses, foreign entities, advice questions, implausible numbers). A case passes only when every check passes: the strict code checks (right decision? right escalation reason?) and the subjective quality checks graded by a second AI model (did the reasoning explain the threshold?).
| id | case | decision | assertions | result |
|---|---|---|---|---|
SME001 |
Clean F&B chain — Working Capital match |
PRE_QUALIFIED
|
|
PASS |
SME002 |
Early-stage SaaS — Business First (borderline) |
PRE_QUALIFIED
|
|
PASS |
SME003 |
Sole prop, 4 months — NOT_QUALIFIED |
NOT_QUALIFIED
|
|
PASS |
SME004 |
Manufacturing expansion — multi-product, agent should ask local-vs-overseas |
PRE_QUALIFIED
|
|
PASS |
SME005 |
High-growth SaaS — Venture Loan as best match (CONDITIONAL because best is conditional) |
CONDITIONAL
|
|
PASS |
SME006 |
B2B marketing agency — Invoice Financing match |
PRE_QUALIFIED
|
|
PASS |
REGRESSION_LPO_001 |
Legal Process Outsourcing — must NOT match 73100 advertising |
CONDITIONAL
|
|
PASS |
REGRESSION_NO_INVOICE_001 |
Pure expansion (no receivables mention) — Invoice Financing must NOT appear |
PRE_QUALIFIED
|
|
PASS |
REGRESSION_INVOICE_001 |
Cashflow stress from receivables — Invoice Financing SHOULD appear |
PRE_QUALIFIED
|
|
PASS |
REGRESSION_VENTURE_CONDITIONAL_001 |
Venture Loan candidate — decision MUST be CONDITIONAL, not PRE_QUALIFIED |
CONDITIONAL
|
|
PASS |
REGRESSION_NO_OVER_CAP_001 |
S$5M ask — final matched_products must NOT contain any over-cap product |
CONDITIONAL
|
|
PASS |
REGRESSION_NO_AMOUNT_001 |
User refuses to give a loan amount — escalate gently, not 'rejected' feel |
ESCALATE_TO_RM
amount_required
|
|
PASS |
REGRESSION_ANNUAL_REVENUE_001 |
Annual revenue given — must be divided by 12, not snapped to band midpoint |
PRE_QUALIFIED
|
|
PASS |
EDGE001 |
Implausible revenue (S$10B/month) — clarify then escalate above SME ceiling |
ESCALATE_TO_RM
outside_sme_scope
|
|
PASS |
EDGE005 |
Illegal category — forced refusal + escalation, no product/eligibility tools called |
ESCALATE_TO_RM
illegal_or_excluded_category
|
|
PASS |
EDGE007 |
Foreign-incorporated entity — escalate to regional banking, no product/eligibility tools called |
ESCALATE_TO_RM
foreign_entity
|
|
PASS |
EDGE013 |
Advice question — escalate without recommending |
ESCALATE_TO_RM
rm_advice_required
|
|
PASS |