
I am currently an independent consultant in the pretraining of large language models, including optimal scaling laws, efficient parallelization, architecture design and optimization. I am particularly interested in the efficient deployment and improvement of architecture-aware optimizers like Muon.
Before that, I did my PhD at Mila, Quebec AI Institute under the supervision of Ioannis Mitliagkas, with a focus on first-order optimization for deep learning (both theoretical and empirical). The core of my PhD research has been a deep dive into the assumptions used by theoretical frameworks studying optimization, to produce insights more practically relevant in the era of LLMs.
During my PhD, I had the priviledge to intern at ServiceNow Research with Joao Monteiro and Torsten Scholak, and at Apple MLR with Eugène Ndiaye. Before my PhD, I was a visiting scholar for a year at UC Berkeley’s EECS department under the supervision of Alexandre Bayen. I completed my Masters in applied mathematics at École Normale Supérieure de Paris-Saclay.