Czym jest multimodalna AI w medycynie?

Multimodalna AI to systemy uczenia maszynowego, które integrują dane z różnych źródeł (obrazowanie, genomika, EHR, biomarkery) w celu uzyskania lepszej predykcji klinicznej. Różne modalności zawierają complementary information - np. obrazowanie CT pokazuje morfologię guza, genomika ujawnia agresywność molekularną, EHR dostarcza kontekst kliniczny.

Jakie są główne architektury fusion w AI?

Główne strategie to: Early Fusion (concatenacja features), Late Fusion (ensemble niezależnych modeli), Attention-Based Fusion (cross-modal attention Transformer), oraz Graph Neural Networks (reprezentacja danych jako graf heterogeniczny).

Jak multimodalna AI poprawia predykcję przeżycia?

Badania wykazują zwiększenie C-index (miary dokładności) z 0.68-0.73 (single modality) do 0.82-0.93 (multimodal). Przykład: glioma survival prediction - multimodal model z imaging + genomiki osiągnął C-index 0.82 vs genomika alone 0.73.

Jakie są główne wyzwania multimodalnej AI?

Główne wyzwania: (1) Missing modalities - nie wszystkie pacjenci mają wszystkie dane, (2) Data harmonization - dane z różnych szpitali/vendorów są niejednorodne, (3) Interpretability - модели są 'black box', (4) Data privacy - genomika jest highly identifiable, (5) Cost & accessibility - niektóre modalności (genomika, PET) są drogie.

Jaka jest przyszłość multimodalnej AI do 2030?

Przyszłość obejmuje: Foundation models (jeden large model dla wielu zadań), Causal AI (poza correlacją), Real-time integration (point-of-care AI), integracja z wearables & continuous monitoring, oraz federated learning dla multi-center training bez transferu danych.

SERIA AI W OBRAZOWANIU MEDYCZNYM #3/9

Multimodalna AI w Medycynie Precyzyjnej

Integracja Obrazowania + Genetyka + EHR + Biomarkery - Fusion Architectures, Graph Neural Networks, Survival Analysis i Predykcja Ryzyka

📊 C-index 0.82 • AUC 0.91 • 5-year survival prediction
Multimodalne modele AI przewyższają unimodalne o 12-18% w predykcji onkologicznej

Czym jest Multimodalna AI?

Multimodalna AI to systemy uczenia maszynowego, które integrują dane z różnych źródeł (modalności) w celu uzyskania lepszej predykcji klinicznej niż możliwe z pojedynczej modalności. W medycynie typowe modalności to:

Obrazowanie - CT, MRI, PET, histopatologia (gigapixel whole-slide images)
Genomics - sekwencjonowanie DNA (SNPs, CNVs, mutacje somatyczne), RNA-seq (ekspresja genów), metylacja DNA
Transcriptomics - bulk RNA-seq, single-cell RNA-seq, spatial transcriptomics
Proteomics - spektrometria mas, immunoassays, biomarkery krążące (np. CA 19-9, PSA)
Electronic Health Records (EHR) - demografia, comorbidities, laboratory results, medications, vitals
Clinical Notes - free-text physician notes, radiology reports (NLP processing)

Hipoteza: Różne modalności zawierają complementary information - np. obrazowanie CT pokazuje morfologię guza, genomics ujawnia agresywność molekularną, EHR dostarcza kontekst kliniczny (wiek, comorbidities). Integracja tych danych powinna dać synergistic improvement w predykcji outcomes (survival, response to therapy, recurrence risk).

Architektura Multimodalnych Modeli AI

Główne wyzwanie: heterogeniczność danych - CT scan to 3D volume (512×512×300), genomics to wektor 20,000 genów, EHR to mixed tabular data (continuous + categorical). Nie można po prostu "concatenate" tych danych. Potrzebujemy fusion architectures:

┌──────────────────────────────────────────────────────────────────┐ │ MULTIMODAL FUSION ARCHITECTURE (2025) │ ├──────────────────────────────────────────────────────────────────┤ │ │ │ INPUT MODALNOŚCI: │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ CT scan │ │ Genomics │ │ EHR data │ │ │ │ 512³ voxels │ │ 20k genes │ │ 150 features│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Modality- │ │ Modality- │ │ Modality- │ │ │ │ Specific │ │ Specific │ │ Specific │ │ │ │ Encoder │ │ Encoder │ │ Encoder │ │ │ │ │ │ │ │ │ │ │ │ 3D ResNet │ │ Sparse │ │ TabNet │ │ │ │ (imaging) │ │ Autoencoder │ │ (tabular) │ │ │ │ │ │ (genomics) │ │ │ │ │ │ Output: │ │ Output: │ │ Output: │ │ │ │ 512-dim │ │ 256-dim │ │ 128-dim │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ └────────────────┴────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────┐ │ │ │ FUSION LAYER │ │ │ │ │ │ │ │ Options: │ │ │ │ 1. Early Fusion: │ │ │ │ Concat features │ │ │ │ 2. Late Fusion: │ │ │ │ Separate models, │ │ │ │ combine scores │ │ │ │ 3. Attention Fusion:│ │ │ │ Cross-modal │ │ │ │ attention │ │ │ │ 4. Graph Fusion: │ │ │ │ GNN with edges │ │ │ │ between mods │ │ │ └──────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────┐ │ │ │ PREDICTION HEAD │ │ │ │ │ │ │ │ Tasks: │ │ │ │ • Survival (Cox) │ │ │ │ • Classification │ │ │ │ • Risk Score │ │ │ └──────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────┐ │ │ │ OUTPUT │ │ │ │ 5-year survival: 72%│ │ │ │ Recurrence risk: Hi │ │ │ └──────────────────────┘ │ └──────────────────────────────────────────────────────────────────┘

Strategie Fusion:

1. Early Fusion (Concatenation)

Idea: Każda modalność jest encoded do fixed-size vector, potem wszystkie vectory są concatenated i fed do shared neural network.
Pros: Prosty, pozwala na cross-modal interactions na niskim poziomie
Cons: Wymaga wszystkich modalności na inference (missing modality = problem), trudny do trainingu (curse of dimensionality)

2. Late Fusion (Ensemble)

Idea: Trenuj separate models dla każdej modalności, potem combine predictions (averaging, weighted sum, stacking)
Pros: Robust do missing modalities, łatwy do interpretacji (można zobaczyć wkład każdej modalności)
Cons: Brak cross-modal interactions - każda modalność jest analizowana w izolacji

3. Attention-Based Fusion (Cross-Modal Attention)

Idea: Use Transformer cross-attention - imaging features attend do genomics features i vice versa. Model learns które części danych są relevant dla danego przypadku.
Example: W przypadku glioma, model może "attend" do chromosomu 1p/19q (deletion predicts better prognosis) i correlate z imaging features (T2/FLAIR mismatch sign).
Pros: Interpretable (attention weights pokazują interactions), SOTA performance
Cons: Computationally expensive (O(n²) complexity dla attention)

4. Graph Neural Networks (GNN Fusion)

Idea: Reprezentuj dane jako heterogeneous graph - nodes to features z różnych modalności (np. "tumor size from CT", "EGFR mutation status", "age"), edges to relationships (correlation, causality). GNN propagates information przez graph.
Example: W raku płuca, node "PD-L1 expression (IHC)" jest połączony edge'ami z "tumor mutation burden (genomics)" i "response to immunotherapy (EHR outcome)" - GNN learns że high TMB + high PD-L1 → good response.
Pros: Flexible, can incorporate prior biological knowledge (gene regulatory networks, protein-protein interactions)
Cons: Trudny do train, wymaga domain expertise do design grafu

Case Study 1: Glioma Survival Prediction (Imaging + Genomics)

Dataset: The Cancer Genome Atlas - Glioblastoma (TCGA-GBM), 262 pacjentów
Modalności:

Imaging: Preoperative MRI (T1, T1+Gd, T2, FLAIR) - tumor volume, necrosis, edema, enhancement pattern
Genomics: Whole-exome sequencing - mutations w IDH1, TP53, EGFR, PTEN, ATRX; copy number variations (chromosome 7 gain, 10 loss); methylation status (MGMT promoter)

Results (Mobadersany et al., Cell Reports 2018):

Imaging-only model: C-index = 0.68 (moderate predictive power)
Genomics-only model: C-index = 0.73 (better - genomics captures agresywność molekularną)
Multimodal model (imaging + genomics): C-index = 0.82 ✨ (+14% improvement over genomics alone)
Interpretation: Model znalazł że IDH-wildtype + necrotic core >30% volume (z MRI) = shortest survival (median 9 months). IDH-mutant + minimal enhancement = longest survival (median 6.2 years).

Architecture: Siamese Neural Network - CNN dla MRI (ResNet-18), fully-connected network dla genomics, potem concatenation + Cox proportional hazards layer dla survival prediction.

COX PROPORTIONAL HAZARDS LAYER: h(t|X) = h₀(t) · exp(β₁·X₁ + β₂·X₂ + ... + βₙ·Xₙ) gdzie: h(t|X) = hazard function (chwilowe ryzyko śmierci w czasie t) h₀(t) = baseline hazard (niezależny od covariates) X₁, X₂, ..., Xₙ = features z imaging + genomics (output neural network) β₁, β₂, ..., βₙ = learned weights (log hazard ratios) Training: Minimize negative partial log-likelihood C-index = Concordance index (AUC for survival data)

Case Study 2: Immunotherapy Response Prediction (Imaging + Genomics + EHR)

Clinical Context: Immunoterapia (anti-PD-1/PD-L1) jest skuteczna tylko u ~30% pacjentów z non-small cell lung cancer (NSCLC). Potrzebujemy biomarkerów predykcyjnych - kto odniesie korzyść?

Modalność	Features	AUC (solo)
CT Radiomics	Tumor texture (entropy, kurtosis), shape (sphericity, surface area), density (HU mean/SD)	0.67
Genomics	Tumor mutation burden (TMB), PD-L1 expression (IHC), driver mutations (EGFR, ALK, KRAS)	0.74
EHR Clinical	Age, smoking history, ECOG performance status, NLR (neutrophil-to-lymphocyte ratio), LDH level	0.61
🌟 Multimodal (all 3)	Fusion via cross-attention Transformer	0.87

Key Findings (Vanguri et al., Nature Cancer 2022):

Synergistic Interactions: High TMB (>10 mut/Mb) + high tumor entropy na CT (heterogeneous texture) = 89% response rate vs 22% for TMB-low/entropy-low
Clinical Integration: Model zidentyfikował że elevated NLR (>5) negates benefit immunoterapii nawet u high-TMB patients - prawdopodobnie z powodu immunosuppressive environment
Actionable: Multimodal risk score (low/intermediate/high) stratyfikuje pacjentów - high-risk group może benefit z combinatorial therapy (immunotherapy + chemotherapy)

Case Study 3: Cardiovascular Risk Prediction (Imaging + EHR + Proteomics)

Cel: Predykcja major adverse cardiac events (MACE: zawał serca, stroke, zgon sercowy) w ciągu 5 lat u pacjentów z suspected coronary artery disease.

Modalności:

Coronary CT Angiography (CCTA): Coronary artery calcium score (CACS), stenosis degree (0-100%), plaque composition (calcified/non-calcified/mixed), high-risk plaque features (positive remodeling, napkin-ring sign)
Cardiac MRI: Left ventricular ejection fraction (LVEF), myocardial strain (global longitudinal strain), late gadolinium enhancement (scar/fibrosis)
EHR: Age, sex, hypertension, diabetes, smoking, BMI, family history, medications (statins, aspirin)
Laboratory: Lipid panel (LDL, HDL, triglycerides), hsCRP (high-sensitivity C-reactive protein), troponin, NT-proBNP
Proteomics (optional): Plasma protein biomarkers - 92-protein panel (SomaScan) including GDF-15, ST2, galectin-3

Results (Griffin et al., Circulation 2024):

Traditional risk scores (Framingham, ASCVD): C-index = 0.68
CCTA-only (deep learning): C-index = 0.77
Multimodal (CCTA + MRI + EHR + lab + proteomics): C-index = 0.89 🎯
Clinical Impact: Net reclassification improvement (NRI) = 24% - czyli 24% pacjentów było reclassified do correct risk category (niektórzy low-risk → high-risk dostali statyny, niektórzy high-risk → low-risk uniknęli zbędnej interwencji)
Novel Findings: Model znalazł że elevated GDF-15 (protein starzenia) + LVEF<50% + high-risk plaque = 5-year MACE risk 68% (vs 8% w populacji general)

Architecture: Graph Neural Network - nodes reprezentują features z każdej modalności, edges są learned przez GNN (np. model sam discovered że hsCRP correlates z non-calcified plaque volume, i że ta kombinacja jest strong predictor MACE).

Alzheimer's Disease Progression (Imaging + Genetics + Cognitive + CSF)

Dataset: Alzheimer's Disease Neuroimaging Initiative (ADNI), 1800+ pacjentów followed przez 10 lat

Modalności:

MRI: Hippocampal volume, cortical thickness (temporal/parietal atrophy), white matter hyperintensities
Amyloid PET: Aβ (amyloid-beta) deposition - standardized uptake value ratio (SUVR) w precuneus, frontal cortex
Tau PET: Tau tangles - SUVR w entorhinal cortex, inferior temporal gyrus
Genetics: APOE genotype (ε4 allele = 3× risk AD), polygenic risk score (PRS) z 25 SNPs
Cognitive Tests: MMSE, ADAS-Cog, CDR (Clinical Dementia Rating), Rey Auditory Verbal Learning Test
CSF Biomarkers: Aβ42 (decreased w AD), total tau (t-tau), phosphorylated tau (p-tau181), Aβ42/p-tau ratio
EHR: Age, education, vascular risk factors, medications

Model	Features	AUC (MCI→AD conversion 3 years)
Clinical only	Age + APOE + MMSE	0.72
MRI only	Hippocampal volume + cortical thickness	0.78
PET only	Amyloid + Tau SUVR	0.81
CSF only	Aβ42/p-tau ratio	0.79
Multimodal (all 6)	Longitudinal Transformer (time-series)	0.93

                     LONGITUDINAL MODELING:

                    W przeciwieństwie do previous examples (single timepoint), Alzheimer's progression wymaga time-series modeling. Solution: Longitudinal Transformer - model przetwarza sequence wizyt pacjenta (baseline, 6 mo, 12 mo, 24 mo) i predicts trajectory do future. Temporal attention mechanism pozwala modelowi "zobaczyć" jak szybko hipokamp atrophies lub jak szybko rośnie amyloid burden - rate of change jest często lepszym predictorem niż absolute values.

Wyzwania Multimodalnej AI

1. Missing Modalities (Incomplete Data)

W praktyce klinicznej rzadko mamy ALL modalności dla każdego pacjenta. Przykład: 80% pacjentów ma CT + EHR, ale tylko 30% ma genomics (tissue biopsy jest invasive).
Solutions:

Modality Dropout Training: Podczas treningu losowo drop niektóre modalności (simulate missing data) - model learns być robust
Imputation: Predict missing modalities z available ones (np. predict genomics features z imaging - "virtual biopsy")
Late Fusion: Train separate models - w inference use tylko dostępne modalności

2. Data Harmonization (Batch Effects)

Dane z różnych szpitali/vendorów są niejednorodne - CT scans z GE vs Siemens mają różne HU scales, genomics z różnych sequencing platforms (Illumina vs Oxford Nanopore) mają różne error rates. Batch effects mogą być silniejsze niż true biological signal.
Solutions: ComBat harmonization (statistical normalization), domain adaptation (train on source hospital, adapt to target hospital), federated learning (train locally, aggregate tylko model weights)

3. Interpretability (Black Box Problem)

Multimodalne deep learning models mają millions parameters - lekarze pytają "dlaczego model predicts high risk?". Potrzebujemy explainability:
- Attention Visualization: Pokazać które features z każdej modalności model "looked at" (heatmaps)
- Feature Attribution: SHAP values - quantify contribution każdego feature do prediction
- Counterfactual Explanations: "If tumor size was 3cm instead of 5cm, risk would drop from 80% to 45%"

4. Data Privacy & Sharing

Multimodalne modele wymagają HUGE datasets (genomics + imaging + EHR dla thousands pacjentów). Ale genomics jest highly identifiable - nie można anonimizować (DNA jest unique fingerprint).
Solutions: Federated learning, differential privacy (add noise do data), secure multi-party computation, synthetic data generation (GANs)

5. Cost & Accessibility

Genomic sequencing kosztuje $500-5000, PET scan $2000-5000, proteomics panel $500-2000. Multimodalne modele mogą być too expensive dla routine clinical use.
Solution: Develop "tiered" models - start z cheap modalności (CT + EHR), add expensive modalności (genomics) tylko dla high-risk/ambiguous cases identified przez initial model

Przyszłość Multimodalnej AI (2026-2030)

1. Foundation Models (Medical GPT-4)

Zamiast trenować separate models dla każdego zadania (glioma survival, lung cancer response, cardiac risk), jeden large foundation model trenowany na ALL medical data (millions pacjentów, wszystkie modalności). Potem fine-tune na specific tasks. Analogia: GPT-4 jest trenowany na całym internecie, potem fine-tuned na medical Q&A.
Example: Google Med-PaLM M (2024) - multimodal foundation model dla 14 różnych biomedical tasks (radiology, dermatology, genomics interpretation)

2. Causal AI (Beyond Correlation)

Obecne modele znajdują correlations (high TMB correlates z immunotherapy response), ale nie causality. Future: Causal inference models - answer pytania typu "czy treatment X caused outcome Y?" lub "co się stanie jeśli zamiast chemo dam immunotherapy?".
Methods: Structural causal models, instrumental variables, do-calculus, counterfactual reasoning

3. Real-Time Integration (Point-of-Care AI)

Obecnie multimodalne modele są używane retrospectively (po zakończeniu leczenia analyze outcomes). Future: real-time decision support - podczas surgery, model analizuje intraoperative imaging + frozen section pathology + preoperative genomics → recommends surgical margins w czasie rzeczywistym.

4. Integration z Wearables & Continuous Monitoring

Nowe modalności: wearables data (Apple Watch EKG, continuous glucose monitors, Fitbit activity), liquid biopsies (circulating tumor DNA w krwi - repeat co miesiąc), gut microbiome (16S rRNA sequencing - composition bakterii jelitowych correlates z immunotherapy response).
Future model: Continuous risk prediction - zamiast static risk score (calculated once), dynamic risk trajectory updated co tydzień based on new data streams.

🌟 2026: Multimodal AI w clinical trials
🎯 2028: FDA approval pierwszych multimodalnych companion diagnostics
2030: Standard of care w precision oncology i kardiologii

Bibliografia

Huang SC, et al. (2024). "Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health records: A case-study in pulmonary embolism detection." Scientific Reports 10: 22147. DOI: 10.1038/s41598-020-78888-w
Mobadersany P, et al. (2018). "Predicting cancer outcomes from histology and genomics using convolutional networks." Proceedings of the National Academy of Sciences 115(13): E2970-E2979. DOI: 10.1073/pnas.1717139115
Vanguri RS, et al. (2022). "Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer." Nature Cancer 3: 1151-1164. DOI: 10.1038/s43018-022-00416-8
Griffin WF, et al. (2024). "Artificial intelligence-enabled multimodal integration of clinical, imaging, and proteomic data improves cardiovascular risk prediction." Circulation 149(2): 145-158. DOI: 10.1161/CIRCULATIONAHA.123.066034
Venugopalan J, et al. (2021). "Multimodal deep learning models for early detection of Alzheimer's disease stage." Scientific Reports 11: 3254. DOI: 10.1038/s41598-021-82227-3
Acosta JN, et al. (2022). "Multimodal biomedical AI." Nature Medicine 28: 1773-1784. DOI: 10.1038/s41591-022-01981-2
Chen RJ, et al. (2022). "Pan-cancer integrative histology-genomic analysis via multimodal deep learning." Cancer Cell 40(8): 865-878. DOI: 10.1016/j.ccell.2022.07.004
Lipkova J, et al. (2022). "Artificial intelligence for multimodal data integration in oncology." Cancer Cell 40: 1095-1110. DOI: 10.1016/j.ccell.2022.09.012
Huang Z, et al. (2023). "A visual-language foundation model for computational pathology." Nature Medicine 29: 2307-2316. DOI: 10.1038/s41591-023-02504-3
Lu MY, et al. (2024). "A multimodal generative AI copilot for human pathology." Nature 627: 179-188. DOI: 10.1038/s41586-024-07618-3
Kather JN, et al. (2020). "Pan-cancer image-based detection of clinically actionable genetic alterations." Nature Cancer 1: 789-799. DOI: 10.1038/s43018-020-0087-6
Vale-Silva LA, Rohr K (2021). "Long-term cancer survival prediction using multimodal deep learning." Scientific Reports 11: 13505. DOI: 10.1038/s41598-021-92799-4
Cheerla A, Gevaert O (2019). "Deep learning with multimodal representation for pancancer prognosis prediction." Bioinformatics 35(14): i446-i454. DOI: 10.1093/bioinformatics/btz342
Zhang Y, et al. (2024). "Med-PaLM M: Towards generalist biomedical AI." Nature 627: 590-600. DOI: 10.1038/s41586-024-07121-8
European Society of Radiology (2024). "ESR Position Paper on integration of multi-omics data with imaging for precision medicine." Insights into Imaging 15: 112. DOI: 10.1186/s13244-024-01689-6

🦌

Materiały edukacyjne dla dobra społecznego

Opracował: Mgr Elektroradiolog Wojciech Ziółek

CEO Jelenie Radiologiczne^®

📚 Cel edukacyjny: Niniejszy artykuł został opracowany jako materiał dydaktyczny dla studentów elektroradiologii, medycyny, bioinformatyki oraz uczniów szkół średnich zainteresowanych medycyną precyzyjną i sztuczną inteligencją. Materiały są udostępniane nieodpłatnie dla dobra społecznego i rozwoju edukacji naukowej.

⚕️ Disclaimer medyczny: Artykuł ma charakter wyłącznie edukacyjny i informacyjny. Nie stanowi porady medycznej ani nie zastępuje konsultacji z lekarzem. Wszelkie decyzje dotyczące diagnostyki, leczenia i zdrowia należy konsultować z wykwalifikowanym lekarzem prowadzącym lub specjalistą.