Peter Vickers

I am a Research Scientist at Spotify in Toronto working on Natural Language Processing and multimodal AI.

My research focuses on how language models integrate text, vision, and context, with applications including multimodal retrieval, document understanding, and retrieval-augmented generation. I am particularly interested in how evaluation metrics shape system design and research priorities.

Before joining Spotify I worked at Northeastern University’s Institute for Experiential AI, where I led GenAI consulting projects and built large-scale RAG systems deployed across medical and technical domains.

I completed my PhD at the University of Sheffield under Nikos Aletras and Loïc Barrault, where my research explored augmenting language models with multimodal signals. My work includes publications at ACL, EMNLP, and AACL.

During my PhD I participated in the JSALT summer workshops in Baltimore (Speech-to-Speech Translation for Under-Resourced Languages, 2022) and Le Mans (Better Together: Text + Context, 2023), and interned as an Applied Scientist at Amazon adapting large-scale vision–language models for text–image retrieval.

Before computer science I studied English Language and Literature at Magdalen College, Oxford.

Outside research I write poetry and swim in Lake Ontario with a group of stubborn dawn swimmers. I once spent a month backcountry skiing in Greenland. It was, all things considered, relatively peaceful.