CLIP-Guided Pose-Conditioned Image Generation
Fine-tuned Stable Diffusion via ControlNet on 80,000 pose images using CLIP text conditioning
and 2D pose control signals. Pose transfer generalized well across appearances; facial feature
fidelity identified as a key failure mode.
PyTorch
ControlNet
Stable Diffusion
CLIP