Academic | Joan Rodriguez

My research lies at the intersection of Computer Vision and Natural Language Processing, with a focus on multimodal generative models. I work on leveraging information across modalities to produce accurate and controllable outputs – from images to code. More recently, I have been exploring the paradigm of generating code as an alternative to generating images, particularly scalable vector graphics (SVGs), using large language and vision models built on transformers and diffusion.

I completed my PhD at Mila, Quebec AI Institute and École de Technologie Supérieure (ETS), University of Quebec, advised by Prof. Marco Pedersoli, Prof. Chris Pal, and Dr. David Vazquez.

I hold a M.Sc. in Computer Vision from Universitat Autònoma de Barcelona (UAB), where I graduated with honors for my thesis on Text to Scientific Figure Generation, supervised by Dr. David Vazquez and Dr. Pau Rodríguez at ServiceNow Research. Before that, I obtained a B.Sc. in Telecommunication Networks Engineering from Universitat Pompeu Fabra (UPF), with a thesis on Handwritten Text Recognition advised by Prof. Xavier Binefa.

I also completed research internships at UPF with Prof. Xavier Binefa and Prof. Miquel Oliver, and at CVC-UAB with Prof. Joost van de Weijer.