Skip to content
Change the repository type filter

All

    Repositories list

    • RIVER

      Public
      [ICLR 2026] RIVER: A Real-Time Interaction Benchmark for Video LLMs
      Python
      0810Updated Apr 20, 2026Apr 20, 2026
    • EfficientQAT

      Public
      [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
      Python
      MIT License
      30337130Updated Apr 10, 2026Apr 10, 2026
    • MMT-Bench

      Public
      [ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
      Python
      411900Updated Apr 6, 2026Apr 6, 2026
    • V2PE

      Public
      [ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
      Python
      MIT License
      26020Updated Apr 4, 2026Apr 4, 2026
    • GenExam

      Public
      GenExam: A Multidisciplinary Text-to-Image Exam
      Python
      MIT License
      46500Updated Mar 29, 2026Mar 29, 2026
    • InternVideo

      Public
      [ECCV2024] Video Foundation Models & Data for Multimodal Understanding
      Python
      Apache License 2.0
      1462.2k1364Updated Mar 25, 2026Mar 25, 2026
    • InternVL-U

      Public
      InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single frame…
      Python
      MIT License
      1426950Updated Mar 21, 2026Mar 21, 2026
    • Vlaser

      Public
      Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
      Python
      MIT License
      04510Updated Mar 18, 2026Mar 18, 2026
    • GenEditEvalKit

      Public
      The first unified, efficient, and extensible evaluation toolkit for evaluating image generation and editing models across multiple benchmarks.
      Jupyter Notebook
      MIT License
      4000Updated Mar 7, 2026Mar 7, 2026
    • VKnowU

      Public
      Python
      11100Updated Feb 3, 2026Feb 3, 2026
    • MetaCaptioner

      Public
      Python
      45020Updated Jan 27, 2026Jan 27, 2026
    • ScaleCUA

      Public
      [ICLR 2026 Oral] ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).
      Python
      Apache License 2.0
      781.1k141Updated Jan 7, 2026Jan 7, 2026
    • GUI-Odyssey

      Public
      [ICCV 2025] GUIOdyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUIOdyssey consists of 8,834 episodes from 6 mobile d…
      Python
      9156100Updated Jan 3, 2026Jan 3, 2026
    • SDLM

      Public
      Sequential Diffusion Language Model (SDLM) enhances pre-trained autoregressive language models by adaptively determining generation length and maintaining KV-ca…
      Python
      MIT License
      49700Updated Dec 27, 2025Dec 27, 2025
    • SID-VLN

      Public
      Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
      Python
      MIT License
      21200Updated Nov 29, 2025Nov 29, 2025
    • vinci

      Public
      Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
      Python
      28920Updated Nov 27, 2025Nov 27, 2025
    • OmniQuant

      Public
      [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
      Python
      MIT License
      78891302Updated Nov 26, 2025Nov 26, 2025
    • VideoChat-Flash

      Public
      [ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
      Python
      MIT License
      19518100Updated Nov 18, 2025Nov 18, 2025
    • ExpVid

      Public
      0900Updated Oct 28, 2025Oct 28, 2025
    • VideoChat-R1

      Public
      [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
      Python
      10267240Updated Oct 18, 2025Oct 18, 2025
    • NaViL

      Public
      Python
      MIT License
      79200Updated Oct 10, 2025Oct 10, 2025
    • PonderV2

      Public
      [T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
      Python
      MIT License
      837200Updated Sep 30, 2025Sep 30, 2025
    • InternVL

      Public
      [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
      Python
      MIT License
      76910k30311Updated Sep 22, 2025Sep 22, 2025
    • [CVPR 2024] Data and benchmark code for the EgoExoLearn dataset
      Python
      MIT License
      28240Updated Aug 26, 2025Aug 26, 2025
    • VRBench

      Public
      [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
      Python
      Apache License 2.0
      02610Updated Aug 8, 2025Aug 8, 2025
    • PIIP

      Public
      [NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
      Python
      MIT License
      511320Updated Aug 5, 2025Aug 5, 2025
    • LORIS

      Public
      [ICML2023] Long-Term Rhythmic Video Soundtracker
      Python
      MIT License
      16210Updated Jul 28, 2025Jul 28, 2025
    • TPO

      Public
      Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
      Jupyter Notebook
      66510Updated Jul 22, 2025Jul 22, 2025
    • Docopilot

      Public
      [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding
      Python
      MIT License
      13720Updated Jul 22, 2025Jul 22, 2025
    • Mono-InternVL

      Public
      [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
      Python
      MIT License
      010870Updated Jul 18, 2025Jul 18, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.