InsightLM Logo
  Property & Casualty

A P&C Insurance SLM Trained On Your Own Policies, Forms and Claim Notes

Use InsightLM to build a fine-tuned small language model purpose-built for underwriting, claims and customer service — running inside your own VPC, complementing the Bedrock Claude and Copilot GPT investments you already have.

Where Generalist LLMs Fall Short in P&C

Bedrock Claude and Copilot GPT are excellent generalists. P&C work exposes four specific gaps that a fine-tuned domain SLM is built to close.

Coverage Hallucinations

Generalist models confidently misread endorsements, exclusions and state-form language because they were never trained on your policy wordings. The cost of one wrong coverage answer is real money and real complaints.

PII / PHI & Residency

Claim notes and FNOL transcripts are full of PII and sometimes PHI. Sending raw text to a hosted generalist API triggers a privacy-review cycle each quarter, plus state-DOI questions on data residency and AI use disclosures.

Cost At Claims Volume

A 7B fine-tuned SLM serving FNOL classification and adjuster summarization at scale typically runs at 10–100x lower per-call cost than a hosted frontier LLM — with predictable, capacity-based pricing instead of per-token meters.

Latency For Agent-Assist

Inside an adjuster's workflow, sub-second response matters. A small, quantized SLM running on your own GPUs delivers consistent low latency without the variable tail of a public API.

InsightLM P&C Reference Architecture

From your existing data sources to a deployed, monitored P&C SLM — in your own environment.

1
P&C Data Sources

Policy wordings, ACORD forms, claim notes, FNOL transcripts, underwriting guidelines

2
Curate & Scrub

Layout-aware parsing, OCR, PII / PHI scrubbing, dedup, glossary alignment

3
Synthesize

Q&A pairs, instruction sets, reasoning traces, hard-negatives from your corpus

4
Fine-Tune

Qwen / Llama / Mistral base, SFT + LoRA / QLoRA, DPO for refusals & grounding

5
Evaluate

P&C task suite, LLM-as-judge with rubrics, regression gating, red-team probes

6
Serve

vLLM / SGLang on your VPC GPUs, guardrail SLM, monitoring & drift alerts

Every step is reproducible: each model artifact is linked back to its dataset hash, training recipe and code commit. Your audit team gets a clean lineage; your engineering team gets predictable retrains.

Six P&C Use Cases You Can Ship

Each card shows the task, the input, the output and a target quality / cost bar — the actual conversation you'll have with your AI program office.

 FNOL Triage

FNOL Classification & Severity Scoring

Classify a first notice of loss the moment it arrives — line of business, peril, severity tier, special-handling flags — so it routes to the right adjuster queue without manual review.

Input
FNOL transcript, notes, structured intake fields
Output
JSON: line of business, peril code, severity (1–5), SIU flag, urgency
Target quality
≥ 92% top-1 accuracy on peril; ≥ 88% on severity tier
Target cost
~$0.02 per 1K classifications on a 7B SLM
 Policy Q&A

Coverage Q&A Grounded in Policy Documents

Answer "is this covered?" questions for agents, adjusters and customers — always grounded in the actual policy form, endorsements and state amendments, with citations back to the page.

Input
Customer or agent question + policy document set
Output
Plain-language answer + cited section, page and form number
Target quality
≥ 95% citation precision; < 2% unsupported claim rate
Target cost
~$0.10 per 1K answered queries (SLM + retrieval)
 ACORD Extraction

Structured Extraction From ACORD & Carrier Forms

Extract insured name, policy number, peril, loss date, coverage limits, deductibles and endorsements from ACORD-25 / 27 / 125 / 140 and carrier-specific PDFs — into a clean schema for your claim system.

Input
Scanned or digital ACORD / carrier PDF (mixed quality)
Output
JSON conforming to your claim-system schema, with confidence per field
Target quality
≥ 96% field-level F1 on key fields; clean OCR fallback
Target cost
~$0.04 per form, including OCR and SLM extraction
 Adjuster Summaries

Claim Note & Call Summarization

Compress months of adjuster notes and call transcripts into a structured claim summary — status, key events, open actions, next steps — refreshed every time the file changes.

Input
Full claim file: notes, emails, call transcripts, attachments
Output
Structured summary: status, timeline, parties, open tasks, recommended next action
Target quality
≥ 4.3 / 5 adjuster usefulness rating; < 1% factual error
Target cost
~$0.03 per summary (long-context SLM)
 SIU & Subrogation

Fraud-Risk & Subrogation Flagging

Surface claims that warrant SIU review or hold subrogation potential, by reading notes and structured signals together — so investigators see the right files and recovery dollars are not left on the table.

Input
Claim summary + structured signals (loss type, parties, prior claims)
Output
SIU score 0–1, subro score 0–1, top contributing reasons
Target quality
SIU recall ≥ 0.80 at 0.05 false-positive; subro lift ≥ 2x
Target cost
~$0.01 per claim scored
 Customer Comms

Plain-Language Denial & Status Letters

Draft denial, partial-coverage and status letters in plain, compliant language — tuned to your tone, your state-by-state disclosure rules, and your prior-letter style guide.

Input
Coverage decision facts + customer + jurisdiction
Output
Letter draft with required disclosures, ready for adjuster review
Target quality
≥ 80% adjuster acceptance with minor edits; 100% disclosure compliance
Target cost
~$0.05 per letter draft

Reference Scorecard (Design Targets)

The bar we design and evaluate a P&C SLM to. Customer-specific scorecards are produced from the held-out evaluation suite during a pilot.

P&C Task Metric Generalist Frontier LLM
(Bedrock Claude / Copilot GPT, zero-shot)
InsightLM Fine-Tuned 7B
(Qwen / Llama / Mistral base)
FNOL peril classification Top-1 accuracy ~85% ≥ 92% (target)
FNOL severity tier Top-1 accuracy ~78% ≥ 88% (target)
Coverage Q&A grounding Citation precision ~80% ≥ 95% (target)
ACORD extraction Field-level F1 ~88% ≥ 96% (target)
Claim summarization Adjuster rating (1–5) ~3.7 ≥ 4.3 (target)
Median latency (agent-assist) p50 / p95 ~900ms / ~3.5s ~150ms / ~600ms (target)
Cost per 1K calls (typical task) USD ~$5–$30 ~$0.02–$0.20 (target)
Targets above represent design goals InsightLM engagements aim for, based on published benchmarks for similarly-sized fine-tuned open-weight models on domain tasks. They are not guarantees and not measurements from a specific customer deployment. Customer-specific results are produced during pilot using held-out data.

How InsightLM Fits Your Existing Stack

InsightLM is not a Bedrock or Copilot replacement — it slots in where a fine-tuned, owned, in-VPC SLM is the better answer, and yields gracefully where a frontier LLM is.

 Use InsightLM SLM
Where the workload is high-volume and well-bounded

FNOL triage, ACORD extraction, claim summarization, policy Q&A, denial-letter drafts. Tasks that run thousands or millions of times, on PII-heavy data, with predictable shape. The combination of cost, latency and data-control makes a fine-tuned SLM the right tool.

 Use Both Together
Where you want a domain SLM with a frontier safety net

Agent-assist where the SLM answers the common 80% in-VPC, and Bedrock Claude is invoked for the long-tail. The orchestration layer is yours; InsightLM emits clean confidence signals so routing to the frontier model is a one-line policy.

 Stay With Frontier LLM
Where flexibility & reasoning trump cost and control

One-off underwriter research, exploratory analytics, a copilot drafting an internal memo. Low volume, broad reasoning, no PII. Bedrock Claude or Copilot GPT is the right tool here — an SLM would be over-engineering.

Deployment Patterns for P&C Carriers

Pick the pattern that matches your data classification, GPU strategy and compliance posture.

 Pattern A — In Your Cloud VPC (most common)

vLLM / SGLang on managed GPU instances inside your AWS, Azure or GCP VPC. Training on the same accounts, datasets stored in your S3 / ADLS / GCS. Bedrock and Copilot remain available for tasks where they're the better fit.

 Pattern B — Hybrid With Bedrock Fallback

InsightLM SLM serves the high-volume in-VPC workload; an orchestrator (your existing one or a thin layer we ship) routes low-confidence cases or rare-domain queries to Bedrock Claude. Single audit log, single cost dashboard.

 Pattern C — On-Prem / Air-Gapped

For carriers with strict data-residency requirements or on-prem mandates from group security: fully-private InsightLM with on-prem GPU clusters, no egress to public APIs, optional fully air-gapped operation.

 Pattern D — Edge For Field Operations

Quantized GGUF / AWQ models run on adjuster laptops or handhelds for catastrophe-response or claim-inspection scenarios where connectivity is unreliable. The same model, same prompts, same eval suite as the cloud deployment.

P&C Data You Already Have

InsightLM curation pipelines turn each source into model-ready training data — with PII / PHI scrubbing and lineage tracked end-to-end.

Policy & Underwriting

Policy wordings, endorsements, state amendments, underwriting guidelines, manuals and bulletins.

ISO Forms Carrier Wordings Endorsements UW Manuals

Claims & Forms

ACORD forms, FNOL records, adjuster notes, estimates, photos & reports, settlement letters.

ACORD-25 ACORD-125 FNOL Notes Estimates

Conversations & Service

Call transcripts, chat logs, agent emails, Salesforce / Guidewire ContactManager notes, complaint records.

Call Audio → Transcripts Chat Logs Emails Complaints

Want To Scope a P&C SLM Pilot?

A typical pilot picks one or two of the use cases above, runs end-to-end on a sample of your data inside your VPC, and produces a real scorecard against your current Bedrock / Copilot baseline in 4–8 weeks.

In your VPC • Your data never leaves your network • Works alongside Bedrock and Copilot