A 5,389-prompt adversarial benchmark across 7 attack categories and three deployment registers, with quantum-augmented generation. Open-source, reproducible, and built to prove production readiness.
FinProof v1 spans 7 attack categories — investment advice, KYC bypass, regulatory misrepresentation, document hallucination, data rights, transaction integrity, and account bypass — across three deployment registers: professional compliance, retail customer mobile, and RM internal.
Medium-difficulty attacks are generated with a Quantum Circuit Born Machine (QCBM) on PennyLane, producing diverse, realistic adversarial coverage no static dataset can match.
Across all tiers and registers.
BFSI-specific threat taxonomy.
Compliance, retail, internal.
Augmented attack diversity.
Eval harness + 782 benign FPR-calibration examples.
1,606 direct-difficulty adversarial prompts.
2,036 medium-difficulty quantum-generated attacks.
1,747 hard attacks — evaluated by Zytra only.
Eliciting unlicensed or non-compliant financial recommendations.
Attempts to circumvent identity verification controls.
Inducing false claims about products, terms or compliance.
Fabricating statements, figures or official documentation.
Privacy violations and unauthorized data disclosure.
Manipulating payments, transfers or transaction logic.
Unauthorized access to accounts or privileged actions.
Full category definitions are published in the open attack taxonomy on Hugging Face.
How leading safety models perform. Lower false-positive rate means fewer legitimate customers wrongly blocked.
| Rank | Model | HackaPrompt R | AgentHarm FPR | WildGuard F1 | Latency |
|---|---|---|---|---|---|
| 1 | Lynx v1.5 Zytra | 0.994 | 0.5% | 0.303 | 11.6ms |
| 2 | PromptGuard-86M Meta | 1.000 | 96.9% | 0.095 | 8ms |
| 3 | LlamaGuard-3-1B Meta | 0.0% | 0% | 0.0 | ~60ms |
| 4 | Granite Guardian IBM | 0.0% | 45% | 0.0 | ~100ms |
Official evaluation on the withheld Tier 4 set is conducted by Zytra. Public self-evaluation (Tier 1 + 2) is available now.
Run against the FinProof withheld test set and see where you rank.