Architecture, pipeline design, model specification, and performance validation across eight AI detection engines for complete blood count intelligence.
Sentinel Hema is built on a thesis that the complete blood count — the most frequently ordered and most underread laboratory test in medicine — contains layers of diagnostic intelligence that standard reference-range flagging systematically fails to extract. Each CBC generates 37 or more discrete parameters. Evaluated in isolation, these values yield binary normal/abnormal flags. Evaluated as an interconnected constellation, they reveal disease signatures invisible to conventional interpretation.
This document specifies the technical architecture, processing pipeline, model design, and validation performance for each of the platform's eight AI detection engines. Together, these engines transform the CBC from a diagnostic checklist into a continuous surveillance and classification system spanning hematological malignancy, anemia etiology, coagulation risk, infection typing, bone marrow function, and predictive trajectory analysis.
The engine suite employs a shared data ingestion layer for HL7/FHIR interoperability, with each engine operating as an independent analytical module that can trigger cross-engine cascades when its findings implicate a related domain. This architecture enables both real-time clinical decision support and longitudinal monitoring across the full hematological spectrum.
Model architectures range from gradient-boosted ensembles for tabular CBC data (Engine 01) to diffusion-based generative classifiers for morphological image analysis (Engine 02), hybrid CNN–Vision Transformer networks for malignancy screening (Engine 03), and LSTM-based temporal networks for longitudinal pattern detection (Engine 08). All engines undergo multi-center validation with independent external test cohorts and are designed for SMART on FHIR integration with existing EHR infrastructure.
Thirty-seven parameters as a constellation — not a checklist. Every CBC tells a story most systems never read.
The core of Engine 01 is a gradient-boosted ensemble that processes all 37+ CBC parameters simultaneously rather than evaluating each against isolated reference ranges. Feature importance analysis consistently identifies PDW, immature platelet fraction, neutrophil percentage, and RDW as the most discriminative predictors.
Eight CBC-derived inflammatory ratios — NLR, dNLR, LMR, PLR, SII, SIRI, AISI, and HPR — transform nonspecific individual markers into precise composite signatures. A directed acyclic graph method selects optimal feature combinations, enabling a reduced model with as few as four features to retain AUC above 94%.
The training corpus comprises 2.1 million anonymized CBC records from academic medical centers, community hospitals, and ambulatory clinics. Disease representation is balanced through hybrid synthetic data generation based on statistical feature distributions.
Validation follows a discovery-validation cohort design with independent external testing. Precision: CV under 3% for WBC, under 2.5% for hemoglobin, under 6% for RBC — meeting European Federation of Clinical Chemistry guidelines.
Engine 01 triggers downstream engines: morphological anomalies activate Engine 02 (Peripheral Smear AI), inflammatory abnormalities cascade to Engine 06 (Infection Typing), lineage-specific cytopenias trigger Engine 07 (Bone Marrow Stress).
All outputs structured as FHIR DiagnosticReport resources with CDS Hooks for real-time EHR integration. SMART on FHIR launch supports in-context clinical display alongside native analyzer results.
| Metric | Score | |
|---|---|---|
| Overall AUROC | 96.2% | |
| Anemia Detection | 97.8% | |
| Leukemia Flagging | 94.1% | |
| Infection Typing | 92.6% | |
| Reduced Model (4 features) | 94.9% |
By analyzing 37+ parameters as an interconnected constellation, Engine 01 identifies patterns that reference-range checks miss — including early malignancy signatures in inflammatory ratios and pre-anemic iron depletion visible only through RDW–MCV coupling dynamics.
Humans cannot examine every cell in a smear. This engine can — and knows when it is uncertain.
A diffusion-based generative classifier models the full distribution of blood cell morphology rather than discriminating boundaries. This yields accurate classification combined with anomaly detection, domain-shift resistance, and uncertainty quantification surpassing clinical experts.
Each cell is processed through a denoising diffusion probabilistic framework generating per-class likelihood scores — inherently data-efficient and adaptable to staining and imaging variation across institutions.
Over 500,000 peripheral blood smear images — the largest curated collection of its kind. Includes common types, rare variants, and features that confuse both automated systems and humans: reactive lymphocytes mimicking blasts, fragments near platelet size, staining artifacts resembling inclusions.
Inter-observer studies show 15–20% discordance between experienced microscopists on identical smears. Engine 02 eliminates this variability with consistent, reproducible classification and calibrated uncertainty.
Per-cell confidence distributions route uncertain cases to human review with annotated differential possibilities. Hematologists focus expertise on genuinely ambiguous cells rather than routine classification.
Dual-mode operation (auto-verify routine / flag uncertain) reduces hematologist workload 60–70% while maintaining precision for rare pathologies: circulating blasts, microangiopathic changes, parasitic inclusions.
| Metric | Score | |
|---|---|---|
| Cell Classification | 97.4% | |
| WBC Differential | 95.8% | |
| RBC Morphology | 93.4% | |
| Anomaly Detection | 96.1% | |
| Cross-Lab Generalization | 94.2% |
A standard blood smear contains thousands of cells — far more than any human can examine one by one. Engine 02 automates exhaustive analysis, triages routine cases, and highlights unusual findings, transforming the peripheral smear from bottleneck to rapid diagnostic asset.
Every hour of delay costs therapeutic options. This engine buys them back.
Hybrid CNN–Vision Transformer captures local cellular features (nuclear morphology, granulation) and global slide-level patterns (blast %, distribution). Transfer learning from Engine 02's 500K+ corpus provides the backbone; task-specific fine-tuning enables precise subtype discrimination.
| Metric | Score | |
|---|---|---|
| Blast Detection | 94.8% | |
| ALL vs. AML | 91.2% | |
| CLL Screening | 96.3% | |
| MDS Flagging | 89.6% | |
| False Positive Rate | 2.9% |
~62,000 new leukemia cases annually in the US. The CBC is often the first signal — yet subtle blasts and early dysplasia are routinely missed by automated differentials. Engine 03 transforms the CBC into active malignancy surveillance.
Beyond hemoglobin — determining why the patient is anemic, not merely that they are.
Microcytic + elevated RDW → iron deficiency. Microcytic + normal RDW + targets → thalassemia trait. Schistocytes → microangiopathic hemolysis workup. Teardrops + nucleated RBCs → marrow infiltration flag.
Semi-supervised approach achieves κ = 0.89 expert agreement while reducing diagnostic turnaround — especially valuable for sickle cell and microcytic populations.
| Metric | Score | |
|---|---|---|
| Overall F1 | 93.4% | |
| Iron Deficiency | 96.7% | |
| Sickle Cell | 95.2% | |
| Thalassemia Trait | 91.8% | |
| Hemolytic Subtypes | 89.3% |
Anemia affects one-third of the global population, yet etiology is frequently misclassified. Engine 04 transforms the CBC from a hemoglobin threshold into an etiological classification system — guiding targeted workup rather than empiric iron supplementation.
Platelet count alone is a number. This engine reveals the mechanism — and predicts the trajectory.
DIC mortality exceeds 40% when treatment is delayed. Engine 05 builds a modified ISTH score from platelet trajectory (not just absolute count), schistocyte percentage, and IPF kinetics as fibrinogen consumption surrogate.
Bayesian network enables probabilistic DIC staging (non-overt vs. overt) with 6-hour early warning — identifying consumption before coagulation panels alarm.
| Metric | Score | |
|---|---|---|
| DIC Prediction | 91.7% | |
| TTP / HUS Flagging | 88.4% | |
| HIT Detection | 85.9% | |
| Pseudo-TCP ID | 97.3% |
EDTA-dependent pseudothrombocytopenia accounts for ~17% of low platelet flags. Engine 05 eliminates this artifact while providing coagulopathy risk stratification hours before traditional panels alarm.
Before the culture returns — the CBC already holds the answer.
Bacterial: neutrophilia + left shift + toxic changes (heavy granulation, Döhle bodies, vacuolization). Viral: lymphocytosis + reactive morphology + relative neutropenia. Engine quantifies a continuous bacterial–viral probability spectrum for mixed and atypical presentations.
Pre-culture bacterial probability score guides empiric therapy — reducing unnecessary antibiotic exposure for viral infections while ensuring rapid coverage for bacterial processes. Integrates with Sentinel Sepsis when severity scores exceed threshold for seamless escalation.
| Metric | Score | |
|---|---|---|
| Bacterial vs. Viral | 92.3% | |
| Severity Prediction | 94.6% | |
| Left-Shift Detection | 96.8% | |
| Parasitic Pattern | 87.2% |
Cultures take 24–72 hours. Engine 06 provides probabilistic pathogen typing from CBC alone — guiding antibiotic stewardship at maximum clinical uncertainty.
A non-invasive window into marrow function — reading production stress without aspiration.
RPI reflects erythroid output, IG fraction indicates myeloid activity, IPF mirrors megakaryopoietic stress. Combined with Engine 02 dysplasia scoring, the system generates a multi-lineage marrow health report — confirming biopsy need or providing confidence to defer.
| Metric | Score | |
|---|---|---|
| MDS Flagging | 89.6% | |
| Marrow Failure | 91.2% | |
| Engraftment Prediction | 87.8% | |
| Biopsy Recommendation | 92.4% |
Engine 07 identifies patients who genuinely require biopsy while sparing those with sufficient peripheral blood clarity. Real-time engraftment monitoring enables precision-timed growth factor and transfusion support.
A single CBC is a photograph. A series is a motion picture. This engine reads the film.
LSTM captures long-range dependencies (gradual hemoglobin decline over months → occult GI loss). Temporal CNN detects acute changes (sudden platelet drops → consumption). Time-aware attention handles variable intervals while preserving historical value.
| Metric | Score | |
|---|---|---|
| Trend Detection | 93.1% | |
| Change-Point Accuracy | 90.7% | |
| 7-Day Forecast | 88.3% | |
| Relapse Prediction | 86.9% | |
| Crisis Prediction | 84.5% |
Engine 08 transforms hematological monitoring from snapshots to predictive trajectories — detecting drift before crisis and forecasting where counts are heading, not only where they are.