Gemma Canet Tarrés

Postgraduate research student

g.canettarres@surrey.ac.uk

Academic and research departments

Centre for Vision, Speech and Signal Processing (CVSSP).

About

My research project

Unified representations for compositional search and synthesis from sketch

Obtain a representation of sketches that can be used for both search and image synthesis, by exploring the use of GANs, VAEs and Transformer networks, among others.

Publications

Dan Sebastian Ruta, Gemma Canet Tarres, Alexander Black, Andrew Gilbert, John Philip Collomosse (2024)Self-supervised disentangled representation learning of artistic style through Neural Style Transfer

We present a new method for learning a fine-grained representation of visual style. Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of that domain. Prior visual style representation works attempt to disentangle style (i.e. appearance) from content (i.e. semantics) yet a complete separation has yet to be achieved. We present a technique to learn a representation of visual style more strongly disentangled from the semantic content depicted in an image. We use Neural Style Transfer (NST) to measure and drive the learning signal and achieve state-of-the-art representation learning on explicitly disentangled metrics. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics, encoding far less semantic information and achieving state-of-the-art accuracy in downstream style matching (retrieval) and zero-shot style tagging tasks.

Dan Sebastian Ruta, Andrew Gilbert, Gemma Canet Tarres, Eli Shechtman, Nick Kolkin, John Philip Collomosse (2024)Diff-nst: Diffusion interleaving for deformable neural style transfer

Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.

Gemma Canet Tarres, Zhe Lin, Zhifei Zhang, Jianming Zhang, Yizhi Song, Dan Sebastian Ruta, Andrew Gilbert, John Philip Collomosse, Soo Ye Kim (2024)Thinking Outside the BBox: Unconstrained Generative Object Compositing

Compositing an object into an image involves multiple non-trivial sub-tasks such as object placement and scaling, color/lighting harmonization, viewpoint/geometry adjustment, and shadow/reflection generation. Recent generative image compositing methods leverage diffusion models to handle multiple sub-tasks at once. However, existing models face limitations due to their reliance on masking the original object \sooye{during} training, which constrains their generation to the input mask. Furthermore, obtaining an accurate input mask specifying the location and scale of the object in a new image can be highly challenging. To overcome such limitations, we define a novel problem of \textit{unconstrained generative object compositing}, i.e., the generation is not bounded by the mask, and train a diffusion-based model on a synthesized paired dataset. Our first-of-its-kind model is able to generate object effects such as shadows and reflections that go beyond the mask, enhancing image realism. Additionally, if an empty mask is provided, our model automatically places the object in diverse natural locations and scales, accelerating the compositing workflow. Our model outperforms existing object placement and compositing models in various quality metrics and user studies.

Cusuh Ham, Gemma Canet Tarres, Tu Bui, James Hays, Zhe Lin, John Collomosse (2022)CoGS: Controllable Generation and Search from Sketch and Style, In: S Avidan, G Brostow, M Cisse, G M Farinella, T Hassner (eds.), COMPUTER VISION - ECCV 2022, PT XVI13676pp. 632-650 Springer Nature

DOI: 10.1007/978-3-031-19787-1_36

We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar "style" conditioning image to a transformer-based sketch and style encoder to generate a discrete codebook representation. We map the codebook representation into a metric space, enabling fine-grained control over selection and interpolation between multiple synthesis options before generating the image via a vector quantized GAN (VQGAN) decoder. Our framework thereby unifies search and synthesis tasks, in that a sketch and style pair may be used to run an initial synthesis which may be refined via combination with similar results in a search corpus to produce an image more closely matching the user's intent. We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.

Dan Ruta, Gemma Canet Tarrés, Andrew Gilbert, Eli Shechtman, Nicholas Kolkin, John Collomosse DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer

DOI: 10.48550/arxiv.2307.04157

Dan Ruta, Gemma Canet Tarres, Alexander Black, Andrew Gilbert, John Collomosse ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer

DOI: 10.48550/arxiv.2304.05755

Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully achieved. Our paper aims to learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We use Neural Style Transfer (NST) to measure and drive the learning signal and achieve state-of-the-art representation learning on explicitly disentangled metrics. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics, encoding far less semantic information and achieving state-of-the-art accuracy in downstream multimodal applications.