Introduction
USO (Unified Style-Subject Optimized) is a model developed by Bytedance's UXO team that unifies style-driven and subject-driven generation tasks. Built on the FLUX.1-dev architecture, the model achieves style similarity and subject consistency through decoupled learning and style reward learning (SRL).
USO supports three main approaches:
- Subject-driven: Places subjects into new scenes while maintaining identity consistency
- Style-driven: Applying artistic style to new content based on reference images
- Combined mode: Using both subject and style references
Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter insists on subject consistency, resulting in an apparent antagonism. We argue that both objectives can be unified under a single framework because they ultimately concern the disentanglement and re-composition of âcontentâ and âstyleâ, a long-standing theme in style-driven research. To this end, we present USO, a Unified framework for Style driven and subject-driven GeneratiOn. First, we construct a large-scale triplet dataset consisting of content images, style images, and their corresponding stylized content images. Second, we introduce a disentangled learning scheme that simultaneously aligns style features and disentangles content from style through two complementary objectives, style-alignment training and contentâstyle disentanglement training. Third, we incorporate a style reward-learning paradigm to further enhance the modelâs performance.
https://bytedance.github.io/USO/
https://github.com/bytedance/USO