Introduction
OmniGen2 is a powerful and efficient unified multimodal generative model with a total parameter size of about 7B (3B text model + 4B image generation model). Unlike OmniGen v1, OmniGen2 adopts an innovative dual-path Transformer architecture with completely independent text autoregressive model and image diffusion model to achieve parameter decoupling and specialized optimization.
Model highlights
- Visual understanding: Inherits the powerful image content interpretation and analysis capabilities of the Qwen-VL-2.5 base model
- Text image generation: Create high-fidelity and beautiful images from text prompts
- Instruction-guided image editing: Perform complex, instruction-based image modifications, achieving state-of-the-art performance among open source models
- Context generation: Versatile ability to process and flexibly combine diverse inputs (including people, reference objects, and scenes) to produce novel and coherent visual outputs
Technical features
- Dual-path architecture: Based on Qwen 2.5 VL (3B) text encoder + independent diffusion Transformer (4B)
- Omni-RoPE position encoding: Supports multi-image spatial positioning and identity differentiation
- Parameter decoupling design: Avoids the negative impact of text generation on image quality
- Supports complex text understanding and image understanding
- Controllable image generation and editing
- Excellent detail preservation capabilities
- Unified architecture supports a variety of image generation tasks
- Text generation capabilities: Can generate clear text content in images
Recommended machineďźLarge-PRO
Workflow Overview
How to use this workflow
Step 1: Load Image
Use **Ctrl + B** to switch all nodes in Bypass mode to normal mode to enable the second image input.