Controllable Font Style for Visual Text Generation Using Reference Images
Abstract
The rapid development of diffusion-based generative models has significantly advanced the field of text-to-image generation, enabling high-quality image creation for diverse applications such as photography, digital arts, and advertising. However, generating legible text within images and controlling its style and appearance remains a substantial challenge, limiting their utility in tasks such as advertisements and posters. This work addresses these limitations by proposing a novel approach that combines controllable image generation with a model capable of generating black-and-white text images in specific styles, conditioned on reference images. We leverage the advancements in diffusion models and text generation techniques, offering a solution that provides good legibility of generated text and enables accurate font style imitation.
Type
Publication
32nd International Conference on Neural Information Processing (ICONIP 2025)

Authors
Research Scientist
Jan is a research scientist at CyberAgent, where he works on artificial intelligence and computer vision
with a focus on image generation and editing. He received his PhD in Information Science and Technology
from the University of Tokyo, where his research centered on image generation. Prior to that, he received
his Master’s degree in Creative Informatics from the University of Tokyo, and his Bachelor’s degree
in Computer and Information Science from the Czech Technical University in Prague.
Born and raised in the Czech Republic, he currently works in Japan.
Authors