Search CORE

373 research outputs found

Guiding Instruction-based Image Editing via Multimodal Large Language Models

Author: Du Xianzhi
Fu Tsu-Jui
Gan Zhe
Hu Wenze
Wang William Yang
Yang Yinfei
Publication venue
Publication date: 05/02/2024
Field of study

Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance. The editing model jointly captures this visual imagination and performs manipulation through end-to-end training. We evaluate various aspects of Photoshop-style modification, global photo optimization, and local editing. Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.Comment: ICLR'24 (Spotlight) ; Project at https://mllm-ie.github.io ; Code at https://github.com/tsujuifu/pytorch_mgi

arXiv.org e-Print Archive

Photoactive sites in commercial HZSM-5 zeolite with iron impurities: An UV Raman study

Author: Guiyang Yan
Jinlin Long
Xianzhi Fu
Xuxu Wang
Zhaohui Li
Publication venue
Publication date: 01/01/2008
Field of study

Comptes Rendus Chimie