The greatest challenge to any thinker is stating the problem in a way that will allow a solution.

Bertrand Russell [England, 1902].
My name is Carlos Riquelme. This is my personal website.
About Me
I'm a senior research scientist at Google Brain working on machine intelligence.
Previously, I completed my PhD in statistical machine learning at Stanford.

Large Models. Large Vision & Language Models.
Diverse Data. Multitask & Multimodality.
Fast Inference. Model Sparsity, Compression & Distillation.
Efficient Compute. Adaptive and Conditional Computation with Neural Networks.

Selected Work
Scaling Vision with Sparse Mixture of Experts
LIMoE: the Language-Image Mixture of Experts
PaLI: Scaling Language-Image Learning in 100+ Languages

Google Research Workshop on Sparsity and Adaptive Computation
        Day 1 & Day 2 recordings.
ICML 2022 Workshop on Dynamic Neural Networks

Main Tools
Algorithms, Probability, Statistics, Optimization, Information Theory.
Past Work
Bandits, Reinforcement Learning, Uncertainty Estimation, Active Learning.

I was really lucky to have Ramesh Johari as my PhD advisor at Stanford.
At Oxford, I did work on Probabilistic Combinatorics supervised by Oliver Riordan.
I really enjoyed long discussions with Dragan Vukotic about Functional Analysis.
Thanks! Thanks! Thanks!

Machine Learning & Data Science @ Google, Facebook, Twitter, Quora, Adobe.

Sven Schmit, Mohammad Ghavamzadeh, Alessandro Lazaric, Austin Benson, George Tucker, Matt Johnson, Matt Hoffman, Jasper Snoek, Baosen Zhang, Sid Banerjee, David Walsh, Ilya Tolstikhin, Josip Djolonga, Joan Puigcerver, Basil Mustafa, Neil Houlsby, Rodolphe Jenatton, Srinadh Bhojanapalli.

please, feel free to contact me at     rikelhood AT gmail DOT com.

La incertidumbre es una margarita cuyos pétalos no se terminan jamás de deshojar.

Mario Vargas Llosa


  1. Scaling Vision Transformers to 22 Billion Parameters
    Mostafa Dehghani et al. (Carlos Riquelme)
    Under review.
  2. PaLI: A Jointly-Scaled Multilingual Language-Image Model
    Xi Chen et al. (Carlos Riquelme)
    ICLR 2023.
  3. Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
    Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby.
    ICLR 2023.
  4. On the Adversarial Robustness of Mixture of Experts
    Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme, Pranjal Awasthi, Srinadh Bhojanapalli.
    NeurIPS 2022.
  5. Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
    Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby.
    NeurIPS 2022.
  6. Learning to Merge Tokens in Vision Transformers
    Cedric Renggli, André Susano Pinto, Neil Houlsby, Basil Mustafa, Joan Puigcerver, Carlos Riquelme.
    Under review.
  7. Scaling Vision with Sparse Mixture of Experts
    Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby.
    NeurIPS 2021.
  8. Deep Ensembles for Low-Data Transfer Learning
    Basil Mustafa, Carlos Riquelme, Joan Puigcerver, André Susano Pinto, Daniel Keysers, Neil Houlsby.
    Under review.
  9. Which Model to Transfer? Finding the Needle in the Growing Haystack.
    Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic.
    CVPR 2022.
  10. Scalable Transfer Learning with Expert Models.
    Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby.
    International Conference on Learning Representations, ICLR 2021.
  11. Practical and Consistent Estimation of f-Divergences.
    Paul Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin.
    Neural Information Processing Systems, NeurIPS 2019.
  12. Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates.
    Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu.
    Neural Information Processing Systems, NeurIPS 2019.
  13. Google Research Football: A Novel Reinforcement Learning Environment.
    Work in progress; with Karol Kurach, Anton Raichuk, Piotr Stanczyk, Michal Zajac, Olivier Bachem, Lasse Espeholt, Damien Vincent, Marcin Michalski, Olivier Bousquet, Sylvain Gelly.
    AAAI 2019.
  14. On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation.
    Nicolas Brosse, Carlos Riquelme, Alice Martin, Sylvain Gelly, Éric Moulines.
  15. Failure Modes of Variational Inference for Decision Making.
    Work in progress; with Matthew Hoffman and Matthew Johnson.
    ICML 2018, Workshop on Prediction and Generative Modeling in Reinforcement Learning.
  16. Deep Bayesian Bandits Showdown.
    Riquelme, Tucker, Snoek | ICLR 2018.
    Code: tensorflow/models/research/deep_contextual_bandits
  17. The Beta-VAE's Implicit Prior.
    Work in progress; with Matthew Hoffman and Matthew Johnson.
    NIPS 2017, Bayesian Deep Learning Workshop.
    Link to the paper.
  18. Active Learning for Accurate Estimation of Linear Models.
    Riquelme, Ghavamzadeh, Lazaric | ICML 2017.
  19. Human Interaction with Recommendation Systems: On Bias and Exploration.
    Schmit, Riquelme | AISTATS 2018.
  20. Online Active Linear Regression via Thresholding.
    Riquelme, Johari, Zhang | AAAI 2017.
  21. Pricing Ride-Share Platforms: A Queueing-Theoretic Approach.
    Banerjee, Johari, Riquelme | EC 2015.
  22. Learning Multifractal Structure in Large Networks.
    Benson, Riquelme, Schmit | KDD 2014.
  23. On the Chromatic Number of Random Graphs.
    Carlos Riquelme | Masters Thesis at the University of Oxford, 2012.

What I like most about Madrid is that it always smiles at you, and smiling is contagious.