- contact@verticalserve.com

Frontier LLMs power great copilots for analysts. Production clinical and payer workflows hit four walls a fine-tuned domain SLM is built to clear.
Sending PHI to a hosted public LLM requires a BAA and an ongoing privacy program. An on-prem or HIPAA-aligned VPC SLM removes the BAA-per-quarter cycle and lets you treat clinical text the way you treat any other PHI store.
Generalist models are out-of-date on ICD-10, CPT and SNOMED revisions and frequently invent codes. A fine-tuned SLM trained on the current code sets and your specialty mix achieves coder-level accuracy without the hallucinated CPTs.
Summarizing every encounter or processing every prior-auth at $5–$30 per 1K calls makes the math impossible. A 7B SLM serves the same workload at a fraction of the cost — predictable, capacity-based.
Ambient documentation, in-EHR coding suggestions and clinician copilots need consistent low latency. A small quantized SLM on dedicated GPUs avoids the variable tail of a public API and keeps the clinician in flow.
From clinical documentation, payer policies and literature to a deployed healthcare SLM — in your own environment, with PHI controls end-to-end.
Clinical notes, orders, discharge summaries, payer policies, drug labels, literature, claims
Layout-aware parsing, OCR, PHI scrubbing, dedup, terminology mapping (ICD / CPT / SNOMED)
SOAP / discharge summarization pairs, coding instruction sets, prior-auth templates
Qwen / Llama / Mistral base, SFT + LoRA / QLoRA, DPO for refusals & grounding
Clinical task suite, coder-rated benchmarks, citation-grounding probes, safety red-team
vLLM / SGLang on on-prem or VPC GPUs, guardrail SLM, drift alerts, audit trail
Every artifact is reproducible: each model is linked to its dataset hash, training recipe, and code commit — the lineage your privacy and quality teams need to audit and re-validate releases.
Each card shows the task, input, output and a target quality / cost bar.
Summarize an encounter, an inpatient stay, or a multi-visit episode into a structured note (SOAP, discharge summary, problem-list update) — refreshed every time the chart changes.
Suggest the right diagnosis and procedure codes from the encounter note, with rationale and supporting span — for coder review or autonomous coding on lower-risk encounters.
Draft a prior-authorization letter that cites the payer's medical-necessity criteria and the supporting clinical evidence from the chart — ready for clinician sign-off.
Answer clinician and pharmacy questions grounded in your protocols, drug labels, formularies and indexed literature — with cited sources and a confidence flag for unsupported claims.
Generate plain-language, reading-level-aware explanations of a patient's condition, medications, and after-visit instructions — in their preferred language, ready for clinician review.
Extract adverse events, suspected drug, severity, outcome, and concomitant therapies from spontaneous reports, literature, and case narratives into MedDRA-coded structured records.
The bar an InsightLM healthcare SLM is designed and evaluated to. Customer-specific scorecards are produced from held-out evaluation sets during a pilot.
| Healthcare Task | Metric | Generalist Frontier LLM (Bedrock Claude / Copilot GPT, zero-shot) |
InsightLM Fine-Tuned 7B |
|---|---|---|---|
| SOAP / discharge summarization | Clinician rating (1–5) | ~3.7 | ≥ 4.4 (target) |
| ICD-10 coding suggestion | Top-1 coder agreement | ~78% | ≥ 92% (target) |
| Prior-auth letter draft | Acceptance with minor edits | ~50% | ≥ 75% (target) |
| Literature QA grounding | Citation precision | ~80% | ≥ 95% (target) |
| Adverse-event extraction | Event recall | ~80% | ≥ 92% (target) |
| Median latency (in-EHR) | p50 / p95 | ~900ms / ~3.5s | ~150ms / ~600ms (target) |
| Cost per 1K calls (typical task) | USD | ~$5–$30 | ~$0.02–$0.20 (target) |
Bedrock Claude and Copilot GPT for analysts and admin work; native EHR AI for some workflows; InsightLM for the high-volume PHI-bearing layer that needs to live inside your network.
Encounter summarization, coding, prior-auth, literature QA, patient comms, pharmacovigilance. Tasks where PHI control, citation grounding and per-encounter cost are decisive.
The SLM handles the bulk in-VPC; Bedrock Claude or Copilot GPT picks up complex multi-document reasoning where the SLM signals low confidence. Single observability stack.
Strategy memos, RFP drafts, internal training content. Frontier LLMs are the right tool here — an SLM would be over-engineering. InsightLM does not try to win these.
Pick the pattern that matches your data classification, GPU strategy and HIPAA / privacy posture.
Fully-private InsightLM with on-prem GPU clusters, no egress to public APIs. The standard pattern for health systems with strict PHI handling and existing data-center investments.
vLLM / SGLang on managed GPU instances inside your AWS, Azure or GCP HIPAA-aligned VPC. PHI stays in your accounts; BAAs cover the underlying cloud provider only.
SLM serves the high-volume PHI-bearing workload in-network; the orchestrator routes admin, low-PHI, or rare-domain queries to Bedrock Claude. One observability stack, one cost dashboard.
Quantized GGUF / AWQ models on clinician laptops or ambulatory-clinic hardware for ambient documentation in low-connectivity sites. Same model and prompts as the central deployment.
InsightLM curation pipelines turn each source into model-ready training data — with PHI scrubbing and lineage tracked end-to-end.
Encounter notes, orders, results, discharge summaries, problem lists, medication lists, social history.
Medical-necessity criteria, prior-auth policies, formularies, claims, denial / appeal letters, EOBs.
Drug labels, treatment protocols, indexed literature, clinical-trial protocols, safety reports, MedDRA dictionaries.
A typical pilot picks one or two of the use cases above, runs end-to-end on a sample of your data inside your environment, and produces a clinician-rated scorecard against your current Bedrock / Copilot baseline in 4–8 weeks.
On-prem or HIPAA-aligned VPC • PHI never leaves your network • Complements Bedrock and Copilot