SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing

vivo BlueImage Lab, vivo Mobile Communication Co., Ltd.
* Corresponding authors

Gallery

original 1 enhanced 1
original 2 enhanced 2
original 3 enhanced 3
original 4 enhanced 4
original 5 enhanced 5
original 6 enhanced 6
original 7 enhanced 7
original 8 enhanced 8
original 9 enhanced 9
original 10 enhanced 10
original 11 enhanced 11
original 12 enhanced 12

Video Demo

Abstract

Traditional photographic image editing typically requires users to possess sufficient aesthetic understanding to provide appropriate instructions for adjusting image quality and camera parameters. However, this paradigm relies on explicit human instruction of aesthetic intent, which is often ambiguous, incomplete, or inaccessible to non-expert users. Besides, recent editing models mostly rely on user-provided instructions, while lacking the capability to understand aesthetic deficiencies and reason about improvement strategies. In this work, we propose SmartPhotoCrafter, an automatic photographic image editing method which formulates image editing as a tightly coupled reasoning-to-generation process. The proposed model first performs image quality comprehension and identifies deficiencies by the Image Critic module, and then the Photographic Artist module realizes targeted edits to enhance image appeal, eliminating the need for explicit human instructions. A multi-stage training pipeline is adopted: (i) Foundation pretraining to establish aesthetic understanding and editing capabilities, (ii) Adaptation with reasoning-guided multi-edit supervision to incorporate rich semantic guidance, and (iii) Coordinated reasoning-to-generation reinforcement learning to jointly optimize reasoning and generation. During training, SmartPhotoCrafter emphasizes photo-realistic image generation, while supporting both image restoration and retouching with consistent adherence to color- and tone-related semantics. We also construct a stage-specific dataset, which progressively builds reasoning and controllable generation, effective cross-module collaboration, and ultimately high-quality photographic enhancement. Experiments demonstrate that SmartPhotoCrafter outperforms existing generative models on the task of automatic photographic enhancement, achieving photo-realistic results while exhibiting higher tonal sensitivity to retouching instructions.

Method

Description of image

Coordinated reasoning-to-generation reinforcement learning framework of SmartPhotoCrafter. A unified optimization paradigm is employed to jointly enhance the Image Critic and Photographic Artist, enabling photographic-aware reasoning and image enhancement.

Experiments


Description of image

Examples showcase the cross-attribute enhancement power on the real-world dataset, illustrating the instruction-following capability and generalization ability of SmartPhotoCrafter.

BibTeX

@article{
      
    }