- contact@verticalserve.com

Bedrock Claude and Copilot GPT are useful for analysts. Production banking exposes four constraints a fine-tuned domain SLM is built to meet.
SR 11-7 / OCC 2011-12 expect documented model risk management. A fine-tuned SLM with versioned datasets, recipes, and held-out evals is straightforward to validate; an opaque hosted LLM behind a moving API is not.
Statements, KYC documents and contact-center transcripts contain PII and sometimes PCI. Sending that to a public API requires repeated privacy reviews and a DPIA each quarter. An in-VPC SLM removes that friction.
Triaging millions of inbound interactions or scoring transaction streams at $5–$30 per 1K calls is a non-starter. A 7B fine-tuned SLM serves the same workload at 50–100x lower cost with capacity-based pricing.
Next-best-action for branch staff and agent-assist for contact centers need consistent sub-second response. A small quantized SLM on dedicated GPUs delivers that without the variable tail of a public API.
From your existing data sources to a deployed, audit-ready banking SLM — in your own environment.
KYC docs, statements, transactions, disclosures, contact-center transcripts, complaints
Layout-aware parsing, OCR, PII / PCI scrubbing, dedup, jurisdiction tagging
Q&A pairs, instruction sets, hard negatives, AML / SAR templates from your corpus
Qwen / Llama / Mistral base, SFT + LoRA / QLoRA, DPO for refusals & grounding
Banking task suite, LLM-as-judge with rubrics, regression gating, MRM-ready scorecard
vLLM / SGLang on VPC GPUs, guardrail SLM, monitoring & drift alerts, immutable audit log
Every artifact is reproducible: each model is linked to its dataset hash, training recipe, and code commit — the lineage your model risk management team needs to validate and re-validate releases.
Each card shows the task, input, output and a target quality / cost bar.
Extract identity, address, ownership structure, source-of-funds, beneficial owners and red-flag signals from passport / ID images, utility bills, articles of incorporation and corporate registry exports.
Turn messy raw merchant strings ("SQ *AMY'S COFFEE 4XJ12") into canonical merchant + category + sub-category — for personal-finance UX, dispute handling and AML upstream signals.
Read the structured signals plus surrounding context for an AML alert and draft an investigator-ready narrative; recommend close, escalate or SAR — with cited evidence per claim.
Ground every "what's the wire fee for this product type?" or "is overdraft protection included?" question in the actual disclosure document with a cited section — no hallucinated rates.
Compress loan files, financial statements, and analyst notes into a structured credit memo: borrower profile, deal summary, key risks, mitigants, recommended terms — refreshed each time the file changes.
Classify inbound complaints by product, issue, and CFPB taxonomy; flag regulatory-reportable cases; draft the response and the regulator-facing narrative for the complaint officer to review.
The bar an InsightLM banking SLM is designed and evaluated to. Customer-specific scorecards are produced from held-out evaluation sets during a pilot.
| Banking Task | Metric | Generalist Frontier LLM (Bedrock Claude / Copilot GPT, zero-shot) |
InsightLM Fine-Tuned 7B |
|---|---|---|---|
| KYC document extraction | Field-level F1 | ~88% | ≥ 95% (target) |
| Merchant categorization | Top-1 accuracy | ~88% | ≥ 96% (target) |
| AML alert triage | Recall @ 5% FPR | ~0.72 | ≥ 0.85 (target) |
| Disclosure QA grounding | Citation precision | ~80% | ≥ 95% (target) |
| Complaint classification | Top-1 accuracy | ~84% | ≥ 92% (target) |
| Median latency (agent-assist) | p50 / p95 | ~900ms / ~3.5s | ~150ms / ~600ms (target) |
| Cost per 1K calls (typical task) | USD | ~$5–$30 | ~$0.02–$0.20 (target) |
Bedrock Claude and Copilot GPT for analysts; existing fraud / AML rules engines for transaction scoring. InsightLM slots into the high-volume, regulated, in-VPC layer.
KYC extraction, AML alert triage, transaction categorization, complaint handling, agent-assist. Tasks where MRM, audit, and PII control matter as much as accuracy.
The SLM answers the in-VPC bulk; Bedrock Claude or Copilot GPT picks up complex multi-document reasoning or rare-domain cases for analysts. Single observability and cost dashboard.
Strategy memos, market research, exploratory analytics. Frontier LLMs are the right tool here — an SLM would be over-engineering. InsightLM does not try to win these workloads.
Pick the pattern that matches your data classification, second-line posture, and regulator expectations.
vLLM / SGLang on managed GPU instances inside your AWS, Azure or GCP VPC. Bedrock and Copilot remain available for tasks where they're the better fit. Standard cloud bank pattern.
Fully-private InsightLM with on-prem GPU clusters, no egress to public APIs. Standard pattern for tier-1 banks with strict data-residency requirements or sovereign-cloud mandates.
SLM serves the high-volume in-VPC workload; the orchestrator routes low-confidence cases or rare-domain queries to Bedrock Claude. One audit log, one cost dashboard.
Quantized GGUF / AWQ models on branch-local hardware for next-best-action and document capture in branches with intermittent connectivity. Same model and prompts as the cloud deployment.
InsightLM curation pipelines turn each source into model-ready training data — with PII / PCI scrubbing and lineage tracked end-to-end.
Identity documents, utility bills, articles of incorporation, corporate registry exports, beneficial-owner declarations, source-of-funds packets.
Account statements, transaction streams, merchant feeds, dispute / chargeback files, AML alert tables.
Contact-center transcripts, chat logs, complaint records, disclosures, fee schedules, regulatory filings.
A typical pilot picks one or two of the use cases above, runs end-to-end on a sample of your data inside your VPC, and produces an MRM-ready scorecard against your current Bedrock / Copilot baseline in 4–8 weeks.
In your VPC or on-prem • Your data never leaves your network • MRM-ready lineage and scorecards