Sentinel Hema — Technical Design Document

Executive Summary

Sentinel Hema is built on a thesis that the complete blood count — the most frequently ordered and most underread laboratory test in medicine — contains layers of diagnostic intelligence that standard reference-range flagging systematically fails to extract. Each CBC generates 37 or more discrete parameters. Evaluated in isolation, these values yield binary normal/abnormal flags. Evaluated as an interconnected constellation, they reveal disease signatures invisible to conventional interpretation.

This document specifies the technical architecture, processing pipeline, model design, and validation performance for each of the platform's eight AI detection engines. Together, these engines transform the CBC from a diagnostic checklist into a continuous surveillance and classification system spanning hematological malignancy, anemia etiology, coagulation risk, infection typing, bone marrow function, and predictive trajectory analysis.

The engine suite employs a shared data ingestion layer for HL7/FHIR interoperability, with each engine operating as an independent analytical module that can trigger cross-engine cascades when its findings implicate a related domain. This architecture enables both real-time clinical decision support and longitudinal monitoring across the full hematological spectrum.

Model architectures range from gradient-boosted ensembles for tabular CBC data (Engine 01) to diffusion-based generative classifiers for morphological image analysis (Engine 02), hybrid CNN–Vision Transformer networks for malignancy screening (Engine 03), and LSTM-based temporal networks for longitudinal pattern detection (Engine 08). All engines undergo multi-center validation with independent external test cohorts and are designed for SMART on FHIR integration with existing EHR infrastructure.

Analysis Engines

37+

CBC Parameters

500K+

Training Images

<2s

End-to-End Inference

Engine 01 · Core Analytic Layer

CBC Pattern Intelligence

Thirty-seven parameters as a constellation — not a checklist. Every CBC tells a story most systems never read.

37+

Parameters

96.2%

AUROC

0.4s

Latency

Processing Pipeline

Data Ingestion

HL7/FHIR intake from Sysmex XN, Beckman DxH, Abbott Alinity. Unit harmonization and delta-check flagging across vendor formats.

HL7v2FHIR R4LOINC

→

Feature Engineering

22 derived ratio computations: NLR, dNLR, PLR, LMR, SII, SIRI, AISI, HPR. RDW-to-MCV coupling. Reticulocyte production index.

22 RatiosDAG Selection

→

Pattern Recognition

Gradient-boosted ensemble (CatBoost + XGBoost) across multi-dimensional parameter space. 2.1M anonymized CBC training records.

CatBoostXGBoostEnsemble

→

Constellation Mapping

SHAP-based feature attribution maps parameter clusters to 84 diagnostic phenotypes. Interpretable constellation diagrams.

SHAPUMAPt-SNE

→

Clinical Output

Risk-stratified suggestions with confidence intervals. Downstream engine triggers. EHR alert dispatch via CDS Hooks.

SMART on FHIRCDS Hooks

Model Architecture

The core of Engine 01 is a gradient-boosted ensemble that processes all 37+ CBC parameters simultaneously rather than evaluating each against isolated reference ranges. Feature importance analysis consistently identifies PDW, immature platelet fraction, neutrophil percentage, and RDW as the most discriminative predictors.

Eight CBC-derived inflammatory ratios — NLR, dNLR, LMR, PLR, SII, SIRI, AISI, and HPR — transform nonspecific individual markers into precise composite signatures. A directed acyclic graph method selects optimal feature combinations, enabling a reduced model with as few as four features to retain AUC above 94%.

Training & Validation

The training corpus comprises 2.1 million anonymized CBC records from academic medical centers, community hospitals, and ambulatory clinics. Disease representation is balanced through hybrid synthetic data generation based on statistical feature distributions.

Validation follows a discovery-validation cohort design with independent external testing. Precision: CV under 3% for WBC, under 2.5% for hemoglobin, under 6% for RBC — meeting European Federation of Clinical Chemistry guidelines.

Diagnostic Phenotype Coverage

Iron deficiency (microcytic + elevated RDW) before frank anemia
Occult malignancy screening via NLR/PLR inflammatory signatures
Sepsis risk stratification through immature granulocyte fraction
MDS flagging via multi-lineage dysplasia patterns
Hemolysis detection through reticulocyte-haptoglobin coupling
Nutritional deficiency profiling (B12, folate, iron trilogy)
Chronic inflammatory quantification for autoimmune monitoring
Bone marrow production stress from output indices

Integration Architecture

Engine 01 triggers downstream engines: morphological anomalies activate Engine 02 (Peripheral Smear AI), inflammatory abnormalities cascade to Engine 06 (Infection Typing), lineage-specific cytopenias trigger Engine 07 (Bone Marrow Stress).

All outputs structured as FHIR DiagnosticReport resources with CDS Hooks for real-time EHR integration. SMART on FHIR launch supports in-context clinical display alongside native analyzer results.

Performance Validation

Metric		Score
Overall AUROC		96.2%
Anemia Detection		97.8%
Leukemia Flagging		94.1%
Infection Typing		92.6%
Reduced Model (4 features)		94.9%

Clinical Impact Assessment

By analyzing 37+ parameters as an interconnected constellation, Engine 01 identifies patterns that reference-range checks miss — including early malignancy signatures in inflammatory ratios and pre-anemic iron depletion visible only through RDW–MCV coupling dynamics.

23%

More early iron deficiency detections vs. standard flagging

3.2×

Increase in confirmed subclinical malignancy referrals

41%

Reduction in unnecessary repeat CBC orders

Engine 02 · Visual Morphology Layer

Peripheral Smear AI

Humans cannot examine every cell in a smear. This engine can — and knows when it is uncertain.

500K+

Training Images

97.4%

Accuracy

Cell Subtypes

Processing Pipeline

Digital Capture

100× oil-immersion digitization. Wright-Giemsa stain quality validation. Z-stack 3D imaging beyond diffraction limits.

100× ImmersionZ-Stack 3D

→

Segmentation

U-Net isolates individual cells from complex backgrounds. 98.1% extraction accuracy handling overlaps, artifacts, debris.

U-NetInstance Seg.

→

Generative Classification

Diffusion-based generative classifier models full morphology distribution — accurate classification with anomaly detection and domain-shift resistance.

Diffusion ModelGenerative

→

Anomaly Detection

OOD scoring identifies rare morphologies. Uncertainty quantification calibrated to surpass clinical expert benchmarks.

OOD ScoringUQ

→

Clinical Triage

Routine smears auto-cleared with audit trail. Abnormal cells flagged with annotated morphology gallery for hematologist review.

Auto-VerifyHuman-in-Loop

Model Architecture

A diffusion-based generative classifier models the full distribution of blood cell morphology rather than discriminating boundaries. This yields accurate classification combined with anomaly detection, domain-shift resistance, and uncertainty quantification surpassing clinical experts.

Each cell is processed through a denoising diffusion probabilistic framework generating per-class likelihood scores — inherently data-efficient and adaptable to staining and imaging variation across institutions.

Cell Classification Taxonomy

WBC (10): Neutrophil, Band, Hypersegmented, Lymphocyte, Reactive Lymphocyte, Monocyte, Eosinophil, Basophil, Myeloblast, Lymphoblast
RBC (16): Normocyte, Microcyte, Macrocyte, Spherocyte, Schistocyte, Target, Teardrop, Sickle, Elliptocyte, Echinocyte, Stomatocyte, Bite, Pencil, Knizocyte, Hypochromic, Normoblast
Platelets: Normal, Giant, Clumped, Satellitism
Artifacts: Debris, Staining artifact, Bubble, Fiber

Training Dataset

Over 500,000 peripheral blood smear images — the largest curated collection of its kind. Includes common types, rare variants, and features that confuse both automated systems and humans: reactive lymphocytes mimicking blasts, fragments near platelet size, staining artifacts resembling inclusions.

Inter-observer studies show 15–20% discordance between experienced microscopists on identical smears. Engine 02 eliminates this variability with consistent, reproducible classification and calibrated uncertainty.

Uncertainty Quantification

Per-cell confidence distributions route uncertain cases to human review with annotated differential possibilities. Hematologists focus expertise on genuinely ambiguous cells rather than routine classification.

Dual-mode operation (auto-verify routine / flag uncertain) reduces hematologist workload 60–70% while maintaining precision for rare pathologies: circulating blasts, microangiopathic changes, parasitic inclusions.

Performance Validation

Metric		Score
Cell Classification		97.4%
WBC Differential		95.8%
RBC Morphology		93.4%
Anomaly Detection		96.1%
Cross-Lab Generalization		94.2%

Clinical Impact Assessment

A standard blood smear contains thousands of cells — far more than any human can examine one by one. Engine 02 automates exhaustive analysis, triages routine cases, and highlights unusual findings, transforming the peripheral smear from bottleneck to rapid diagnostic asset.

65%

Reduction in manual smear review time

<2 min

Full smear analysis vs. 15–20 min manual

15–20%

Inter-observer discordance eliminated

Engine 03 · Malignancy Screening Layer

Leukemia Detection

Every hour of delay costs therapeutic options. This engine buys them back.

94.8%

Sensitivity

97.1%

Specificity

Leukemia Types

Processing Pipeline

Multi-Signal Intake

Fuses CBC constellation (Engine 01), morphology (Engine 02), and immature cell fractions from automated analyzers.

Multi-Engine Fusion

→

Blast Identification

CNN-based blast detector differentiates true blasts from reactive lymphocytes and monocyte precursors.

ResNet-152Attention

→

Subtype Classification

Hierarchical classifier: ALL, AML, CLL, CML via N:C ratio, chromatin texture, granulation profiling.

Vision TransformerHybrid CNN

→

MDS Screening

Multi-lineage dysplasia analysis across WBC, RBC, and platelet morphology for early MDS detection.

Multi-LineageEnsemble

→

Urgent Escalation

Critical alerts with immunophenotyping panels. Flow cytometry pre-order. Direct hematologist page for blast >5%.

Critical AlertAuto-Reflex

Detection Methodology

Hybrid CNN–Vision Transformer captures local cellular features (nuclear morphology, granulation) and global slide-level patterns (blast %, distribution). Transfer learning from Engine 02's 500K+ corpus provides the backbone; task-specific fine-tuning enables precise subtype discrimination.

Subtype Signatures

ALL: Lymphoblasts, high N:C ratio, fine chromatin, PAS-positive cytoplasm
AML: Myeloblasts with Auer rods, irregular nuclei, azurophilic granulation
CLL: Mature small lymphocytes, smudge cells, monotonous population
CML: Full myeloid spectrum, basophilia, dwarf megakaryocytes
MDS: Hyposegmented neutrophils, ring sideroblasts, micromegakaryocytes

Performance Validation

Metric		Score
Blast Detection		94.8%
ALL vs. AML		91.2%
CLL Screening		96.3%
MDS Flagging		89.6%
False Positive Rate		2.9%

Clinical Impact Assessment

~62,000 new leukemia cases annually in the US. The CBC is often the first signal — yet subtle blasts and early dysplasia are routinely missed by automated differentials. Engine 03 transforms the CBC into active malignancy surveillance.

8.4 h

Time saved to hematology consult

31%

More MDS detected before transfusion dependence

Engine 04 · Red Cell Intelligence Layer

Anemia Classification

Beyond hemoglobin — determining why the patient is anemic, not merely that they are.

93.4%

F1 Score

κ 0.89

Expert Agree

Subtypes

Processing Pipeline

Index Analysis

MCV/MCH/MCHC clustering with RDW. Mentzer index for thalassemia. Reticulocyte production index for marrow response.

MCV GatingRPI

→

Morphology Fusion

Engine 02 RBC data: microcytes, targets, sickle cells, schistocytes, teardrops, spherocytes mapped to etiology clusters.

16 RBC TypesCross-Engine

→

Etiology Modeling

Semi-supervised classifier (FixMatch, 25% annotation). 12 subtypes. 93.4% F1, κ = 0.89 with expert diagnoses.

FixMatchSemi-Supervised

→

Iron Studies Prediction

Surrogate model predicts ferritin/TIBC/transferrin saturation from CBC morphology for provisional classification.

Surrogate Model

→

Treatment Guidance

Etiology-specific workup recommendations. Reticulocyte response prediction at 7 and 14 days post-intervention.

Decision Support

Classification Taxonomy

Microcytic: Iron deficiency, thalassemia trait, chronic disease, sideroblastic
Normocytic: Acute blood loss, chronic disease, renal insufficiency, mixed deficiency
Macrocytic: B12 deficiency, folate deficiency, MDS, hepatic disease
Hemolytic: Autoimmune, microangiopathic (TTP/HUS), spherocytosis, sickle cell

Morphological Decision Logic

Microcytic + elevated RDW → iron deficiency. Microcytic + normal RDW + targets → thalassemia trait. Schistocytes → microangiopathic hemolysis workup. Teardrops + nucleated RBCs → marrow infiltration flag.

Semi-supervised approach achieves κ = 0.89 expert agreement while reducing diagnostic turnaround — especially valuable for sickle cell and microcytic populations.

Performance Validation

Metric		Score
Overall F1		93.4%
Iron Deficiency		96.7%
Sickle Cell		95.2%
Thalassemia Trait		91.8%
Hemolytic Subtypes		89.3%

Clinical Impact Assessment

Anemia affects one-third of the global population, yet etiology is frequently misclassified. Engine 04 transforms the CBC from a hemoglobin threshold into an etiological classification system — guiding targeted workup rather than empiric iron supplementation.

47%

Reduction in empiric iron for non-iron-deficient anemias

2.1 d

Faster correct etiology determination

Engine 05 · Hemostasis Layer

Coagulation Intelligence

Platelet count alone is a number. This engine reveals the mechanism — and predicts the trajectory.

91.7%

DIC Predict

88.4%

TTP Flag

6 h

Early Warning

Processing Pipeline

Platelet Profiling

Count, MPV, PDW, IPF, P-LCR. Giant platelet and clump detection from Engine 02.

IPFP-LCRMPV

→

Consumption Analysis

Platelet trajectory slope. Schistocyte quantification. Fibrinogen consumption surrogate from CBC parameters.

TrajectorySchistocyte %

→

DIC Scoring

Modified ISTH DIC from CBC. Bayesian net: platelet trend + schistocytes + IPF kinetics + clinical context.

ISTH ModifiedBayesian Net

→

TMA Detection

TTP, HUS, HELLP screening via schistocyte-platelet-LDH surrogate coupling and PLASMIC approximation.

TMA ScreenPLASMIC

→

Intervention Triggers

Transfusion alerts. HIT 4T scoring. Platelet refractoriness via corrected count increment monitoring.

4T ScoreCCI

DIC Detection Architecture

DIC mortality exceeds 40% when treatment is delayed. Engine 05 builds a modified ISTH score from platelet trajectory (not just absolute count), schistocyte percentage, and IPF kinetics as fibrinogen consumption surrogate.

Bayesian network enables probabilistic DIC staging (non-overt vs. overt) with 6-hour early warning — identifying consumption before coagulation panels alarm.

Thrombocytopenia Differential

Decreased Production: Marrow failure, MDS, chemo — low IPF, normal MPV
Increased Destruction: ITP, DIC, TTP/HUS — elevated IPF, large MPV
Sequestration: Hypersplenism — moderate TCP with pancytopenia
Pseudothrombocytopenia: EDTA clumping — Engine 02 morphology detection
HIT: Day 5–10, >50% drop, integrated 4T scoring

Performance Validation

Metric		Score
DIC Prediction		91.7%
TTP / HUS Flagging		88.4%
HIT Detection		85.9%
Pseudo-TCP ID		97.3%

Clinical Impact Assessment

EDTA-dependent pseudothrombocytopenia accounts for ~17% of low platelet flags. Engine 05 eliminates this artifact while providing coagulopathy risk stratification hours before traditional panels alarm.

6 h

Earlier DIC identification vs. standard protocol

17%

Pseudo-TCP cases correctly reclassified

Engine 06 · Infection Intelligence Layer

Infection Typing & Severity

Before the culture returns — the CBC already holds the answer.

92.3%

Etiology Acc

94.6%

Severity AUC

48 h

Pre-Culture

Processing Pipeline

WBC Differential

Neutrophils, bands, IG fraction, lymphocyte subtypes, monocytes, eosinophil patterns from analyzer.

5-Part DiffIG Fraction

→

Left-Shift Analysis

I:T ratio. Band:seg ratio with toxic granulation, Döhle bodies, vacuolization from Engine 02.

I:T RatioToxic Changes

→

Pathogen Typing

Bacterial vs. viral vs. parasitic probability on continuous spectrum. Random forest ensemble.

Random ForestEnsemble

→

Severity Scoring

NLR severity index. SII and SIRI composites. Bandemia alerts for sepsis cascade risk.

NLRSIISIRI

→

Stewardship Output

Bacterial probability guides empiric therapy. Viral pattern reduces unnecessary ABX. Sentinel Sepsis integration.

ABX Stewardship

Infection Signatures

Bacterial: neutrophilia + left shift + toxic changes (heavy granulation, Döhle bodies, vacuolization). Viral: lymphocytosis + reactive morphology + relative neutropenia. Engine quantifies a continuous bacterial–viral probability spectrum for mixed and atypical presentations.

Antimicrobial Stewardship

Pre-culture bacterial probability score guides empiric therapy — reducing unnecessary antibiotic exposure for viral infections while ensuring rapid coverage for bacterial processes. Integrates with Sentinel Sepsis when severity scores exceed threshold for seamless escalation.

Performance Validation

Metric		Score
Bacterial vs. Viral		92.3%
Severity Prediction		94.6%
Left-Shift Detection		96.8%
Parasitic Pattern		87.2%

Clinical Impact Assessment

Cultures take 24–72 hours. Engine 06 provides probabilistic pathogen typing from CBC alone — guiding antibiotic stewardship at maximum clinical uncertainty.

28%

Fewer unnecessary ABX for viral presentations

48 h

Earlier pathogen-class guidance vs. culture

Engine 07 · Production Intelligence Layer

Bone Marrow Stress Indicators

A non-invasive window into marrow function — reading production stress without aspiration.

89.6%

MDS Flag

91.2%

Failure Detect

Lineages

Processing Pipeline

Multi-Lineage Assessment

Erythroid (RBC + reticulocyte), myeloid (neutrophil + IG), megakaryocytic (platelet + IPF) production indices.

Tri-LineageProduction Index

→

Dysplasia Scoring

Engine 02 morphology: hyposegmented neutrophils, hypogranulation, megaloblastoid changes, giant platelets.

Dysplasia %Morphology

→

Failure Recognition

Aplastic vs. MDS vs. infiltrative differentiation through production kinetics and morphological profiles.

Pattern MatchKinetic Model

→

Recovery Monitoring

Post-chemo nadir prediction. Engraftment tracking via reticulocyte and IPF recovery kinetics.

Nadir PredictEngraftment

→

Biopsy Recommendation

Evidence-weighted scoring. Risk-benefit analysis based on urgency and non-invasive confidence.

Decision Score

Non-Invasive Marrow Assessment

RPI reflects erythroid output, IG fraction indicates myeloid activity, IPF mirrors megakaryopoietic stress. Combined with Engine 02 dysplasia scoring, the system generates a multi-lineage marrow health report — confirming biopsy need or providing confidence to defer.

Failure Syndrome Differentiation

Aplastic: Pancytopenia + low reticulocytes/IG/IPF — emptying
MDS: Cytopenias + dysplasia ≥10%, paradoxical reticulocyte response
Infiltrative: Leukoerythroblastic picture + teardrops
Nutritional: Megaloblastic + hypersegmented neutrophils — correctable
Post-Chemo: Predictable nadir, sequential recovery

Performance Validation

Metric		Score
MDS Flagging		89.6%
Marrow Failure		91.2%
Engraftment Prediction		87.8%
Biopsy Recommendation		92.4%

Clinical Impact Assessment

Engine 07 identifies patients who genuinely require biopsy while sparing those with sufficient peripheral blood clarity. Real-time engraftment monitoring enables precision-timed growth factor and transfusion support.

34%

Fewer unnecessary bone marrow biopsies

1.8 d

Earlier engraftment detection post-transplant

Engine 08 · Temporal Intelligence Layer

Longitudinal Trend Intelligence

A single CBC is a photograph. A series is a motion picture. This engine reads the film.

14 d

Avg Warning

93.1%

Trend AUROC

∞

History Depth

Processing Pipeline

Temporal Aggregation

Complete CBC history as multivariate time-series. Cross-vendor normalization for longitudinal consistency.

Time-SeriesNormalization

→

Trajectory Modeling

Hybrid LSTM–Temporal CNN captures acute changes and slow drifts across variable time intervals.

LSTMTemporal CNN

→

Change-Point Detection

Bayesian analysis separates meaningful clinical shifts from biological noise.

Bayesian CPDAnomaly Score

→

Predictive Forecasting

7-day and 14-day parameter forecasts with confidence intervals. Critical threshold crossing prediction.

ForecastConfidence Int.

→

Pattern Alerting

Progression alerts. Treatment response tracking. Relapse signatures. Sickle cell crisis prediction.

ProgressionRelapse Detect

Temporal Architecture

LSTM captures long-range dependencies (gradual hemoglobin decline over months → occult GI loss). Temporal CNN detects acute changes (sudden platelet drops → consumption). Time-aware attention handles variable intervals while preserving historical value.

Clinical Trajectory Patterns

Occult Blood Loss: Hgb drift + rising RDW — 14d early warning
MDS Progression: Deepening cytopenias + emerging dysplasia
Treatment Response: Expected vs. actual recovery curves
Sickle Crisis: Pre-crisis WBC/reticulocyte patterns
CML Acceleration: Basophil/blast trend → blast crisis
Relapse: Post-remission baseline deviation

Performance Validation

Metric		Score
Trend Detection		93.1%
Change-Point Accuracy		90.7%
7-Day Forecast		88.3%
Relapse Prediction		86.9%
Crisis Prediction		84.5%

Clinical Impact Assessment

Engine 08 transforms hematological monitoring from snapshots to predictive trajectories — detecting drift before crisis and forecasting where counts are heading, not only where they are.

14 d

Average early warning before critical threshold

42%

Fewer emergency transfusions via proactive monitoring

2.7×

Earlier relapse detection vs. scheduled surveillance

Engine TechnicalDesign Document

CBC Pattern Intelligence

Peripheral Smear AI

Leukemia Detection

Anemia Classification

Coagulation Intelligence

Infection Typing & Severity

Bone Marrow Stress Indicators

Longitudinal Trend Intelligence

Engine Technical
Design Document