Skip to content

Commit 075af73

Browse files
Add Medico 2026 task details for GI imaging VQA
Medico 2026
1 parent 515f912 commit 075af73

File tree

1 file changed

+105
-0
lines changed

1 file changed

+105
-0
lines changed

_editions/2026/tasks/medico.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
---
2+
# static info
3+
layout: task
4+
year: 2026
5+
hide: false
6+
7+
# required info
8+
title: "Medico 2026: Visual Question Answering (VQA) for Gastrointestinal Imaging"
9+
subtitle: Medico
10+
blurb: "Medico 2026 focuses on Visual Question Answering (VQA) for gastrointestinal (GI) imaging, with an emphasis on explainability, clinical safety, and multimodal reasoning. The task leverages the expanded Kvasir-VQA-x1 dataset, containing more than 150,000 clinically relevant question–answer pairs, to support the development of AI models that can accurately answer questions based on GI endoscopy images while providing coherent and clinically grounded explanations. The goal is to advance trustworthy and interpretable AI decision support for GI diagnostics."
11+
---
12+
13+
<!-- # please respect the structure below-->
14+
*See the [MediaEval 2026 webpage](https://multimediaeval.github.io/editions/2026/) for information on how to register and participate.*
15+
16+
#### Task description
17+
18+
Gastrointestinal (GI) diseases represent a major global health burden, where accurate interpretation of endoscopy findings is critical for diagnosis and treatment planning. While AI systems have demonstrated strong
19+
performance in GI image analysis, their clinical adoption remains limited by insufficient explainability, safety concerns, and a lack of alignment with clinical reasoning.
20+
21+
Building on the success of previous Medico challenges and closely aligned with the ImageCLEFmedical-MEDVQA 2026 initiative, Medico 2026 advances medical VQA for GI imaging with a continued emphasis on interpretability and
22+
reliability. Medical VQA combines computer vision and natural language processing to answer clinically meaningful questions derived from medical images. However, existing approaches often prioritize answer accuracy without
23+
sufficiently addressing explanation quality, safety, or clinical consistency.
24+
25+
The Medico 2026 challenge therefore emphasizes not only correct answers but also multimodal explanations that combine textual and visual evidence and adhere to medical best practices. In addition, the task introduces
26+
evaluation criteria targeting behavioral safety, discouraging undesirable model behaviors such as overconfident answers, misleading justifications, or clinically inappropriate reasoning.
27+
28+
Participating teams will write short working-notes papers that are published in the MediaEval Workshop Working Notes Proceedings. We welcome two types of papers: first, conventional benchmarking papers, which describe
29+
the methods that the teams use to address the task and analyze the results and, second, "Quest for Insight" papers, which address a question aimed at gaining more insight into the task, but do not necessarily present
30+
task results. Example questions for "Question for Insight" papers are below.
31+
32+
#### Motivation and background
33+
34+
For AI systems to be integrated into clinical workflows, they must be transparent, interpretable, and safe. In GI imaging, deep learning models have achieved promising results for classification and detection tasks,
35+
yet their black-box nature limits trust among clinicians. Medical professionals require explanations that clearly connect visual evidence to clinical conclusions.
36+
37+
Medical VQA offers a natural interface for explainable decision support, enabling clinicians to ask structured questions and receive interpretable responses. Nevertheless, many existing VQA models provide answers without
38+
sufficient justification or safeguards against unsafe reasoning. Medico 2026 addresses these limitations by explicitly integrating explainability and safety into both task design and evaluation. By encouraging multimodal
39+
explanations and clinically consistent behavior, the challenge aims to advance AI systems that support, rather than replace, clinical expertise.
40+
41+
42+
#### Task Description
43+
44+
**Subtask 1: Medical Image Question Answering in GI Endoscopy**
45+
46+
This subtask focuses on developing models that accurately answer clinically relevant questions based on GI endoscopy images using the Kvasir-VQA-x1 dataset, which contains more than 150,000 question–answer pairs.
47+
The dataset is derived from established GI endoscopy collections and covers a wide range of anatomical regions, pathological findings, and medical instruments.
48+
49+
Questions span multiple categories, including Yes/No, Single-Choice, Multiple-Choice, Color-Related, Location-Related, and Numerical Count, requiring joint reasoning over visual and textual information. Model
50+
performance is evaluated using quantitative metrics assessing answer correctness and language accuracy.
51+
52+
**Subtask 2: Explainable and Safe Multimodal Reasoning for GI VQA**
53+
54+
This subtask extends Subtask 1 by requiring models to provide coherent multimodal explanations that justify their answers. Explanations must combine textual reasoning with visual evidence, such as highlighted
55+
image regions, in a manner aligned with clinical reasoning.
56+
57+
In addition to interpretability, this subtask introduces a dedicated safety layer that evaluates model behavior across clinical contexts. Models are assessed for undesirable behaviors, including overconfidence,
58+
misleading explanations, or non-compliance with established medical best practices. To support retrieval-augmented reasoning, participants may leverage a curated database of verified endoscopy resources provided as
59+
part of the challenge.
60+
61+
62+
#### Target group
63+
64+
The task targets researchers from multimedia analysis, computer vision, natural language processing, and medical AI communities. Consistent with previous Medico challenges, the task is designed to be accessible
65+
to both experienced researchers and newcomers to medical AI. Mentoring, baseline implementations, and starter documentation will be provided to support undergraduate and graduate students.
66+
67+
68+
#### Data
69+
70+
Medico 2026 uses the Kvasir-VQA-x1 dataset, an expanded GI endoscopy VQA dataset containing more than 150,000 annotated question–answer pairs. The dataset builds on established GI endoscopy image collections and
71+
is curated with clinical input to ensure medical relevance and correctness. Questions are designed to assess visual understanding, clinical interpretation, and reasoning across a diverse set of GI conditions and
72+
procedures.
73+
74+
75+
#### Evaluation methodology
76+
77+
Subtask 1 Evaluation. Models are evaluated using quantitative metrics for answer correctness and language accuracy, including accuracy, precision, recall, and F1 score.
78+
79+
Subtask 2 Evaluation. In addition to Subtask 1 metrics, Subtask 2 includes expert-based evaluation of explanation quality. Explanations are assessed for clarity, coherence, medical relevance, and consistency with
80+
visual evidence. Safety-oriented criteria evaluate whether model outputs demonstrate appropriate uncertainty, factual correctness, and adherence to clinical best practices.
81+
82+
83+
#### Quest for insight
84+
85+
* How can VQA models generate explanations that align with clinical reasoning in GI diagnostics?
86+
* Which multimodal techniques best support transparent and safe medical VQA?
87+
* How can retrieval-augmented reasoning improve factual consistency and clinical reliability?
88+
* What evaluation strategies best capture explanation quality and behavioral safety in medical AI?
89+
* How can models balance accuracy, interpretability, and safety in GI VQA tasks?
90+
91+
#### Risk management
92+
93+
The Medico task series has been successfully organized for multiple years. For the 2026 edition, baseline models, starter code, and detailed documentation will be provided. Previous participants will be
94+
actively invited, and continuous support will be offered throughout the challenge to mitigate technical and organizational risks.
95+
96+
97+
#### Task organizers
98+
99+
* Sushant Gautam, SimulaMet, Norway
100+
* Vajira Thambawita, SimulaMet, Norway
101+
* Pål Halvorsen, SimulaMet, Norway
102+
* Michael A. Riegler, SimulaMet, Norway
103+
* Steven A. Hicks, SimulaMet, Norway
104+
105+

0 commit comments

Comments
 (0)