feat: add multimodal evaluators and prompt templates for image-to-text evaluation#187
feat: add multimodal evaluators and prompt templates for image-to-text evaluation#187jjbuck merged 11 commits intostrands-agents:mainfrom
Conversation
…t evaluation - Introduced `MultimodalFaithfulnessEvaluator`, `MultimodalInstructionFollowingEvaluator`, and `MultimodalOverallQualityEvaluator` to assess various aspects of multimodal responses. - Implemented `MultimodalOutputEvaluator` as a base class for handling multimodal inputs and outputs. - Created prompt templates for evaluation rubrics including correctness, faithfulness, instruction following, and overall quality. - Developed a structured approach for composing evaluation prompts with support for media content. - Added `ImageData` class to manage image sources and formats, enabling flexible input handling for evaluators. - Established a unified module for multimodal evaluation types and data structures.
Co-authored-by: Sungyeon Kim <ksy9597@gmail.com>
…h reference suffix Co-authored-by: Sungyeon Kim <ksy9597@gmail.com>
…ics fix, S3 removal, typing cleanup
afarntrog
left a comment
There was a problem hiding this comment.
I would like to see some tests for this new multimodal. Please add unit and integ tests
Co-authored-by: Sungyeon Kim <ksy9597@gmail.com>
|
@afarntrog Added unit and integration tests:
Also, could we add Sungyeon Kim (@sung-yeon-kim) as a collaborator on this repo? He's been co-authoring the work on this PR. |
Co-authored-by: Sungyeon Kim <ksy9597@gmail.com>
1. Drop
|
…trip bugs Co-authored-by: Sungyeon Kim <ksy9597@gmail.com>
|
Thanks @jjbuck for the detailed review and suggestions! 1. Drop
|
Description
This PR adds multimodal (MLLM-as-a-Judge) evaluators for image-to-text tasks with reference-free and reference-based evaluation support.
MultimodalCorrectnessEvaluator,MultimodalFaithfulnessEvaluator,MultimodalInstructionFollowingEvaluator, andMultimodalOverallQualityEvaluatorto assess various aspects of multimodal responses.MultimodalOutputEvaluatoras a base class for handling multimodal inputs and outputs.ImageDataclass to manage image sources and formats, enabling flexible input handling for evaluators.expected_outputis provided in the test case).Related Issues
#128
Documentation PR
strands-agents/docs#674
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
hatch run prepareChecklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.