Ekdeep Singh Lubana

I am a PhD candidate in EECS Department at the University of Michigan, advised by Robert Dick. I am also affiliated with Harvard Center for Brain Science, where I am mentored by Hidenori Tanaka.

I am generally interested in designing (faithful) abstractions of phenomenon relevant to controlling or aligning neural networks. I am also very interested in better understanding training dynamics of neural networks, especially via a statistical physics perspective.

I graduated with a Bachelor's degree in ECE from Indian Institute of Technology (IIT), Roorkee in 2019. My research in undergraduate was primarily focused on embedded systems, such as energy-efficient machine vision systems.

Email  /  CV  /  Google Scholar  /  Github

profile photo
[06/2024] Preprint on hidden capabilities in generative models is on arXiv now.
[06/2024] Paper on identifying how jailbreaks bypass safety mechanisms accepted as spotlight at ICML-MI workshop, 2024.
[11/2023] Paper on mechanistically analyzing effects of fine-tuning accepted to ICLR, 2024.
[10/2023] Paper on analyzing in-context learning as a subjective randomness task accepted to ICLR, 2024.
[10/2023] Our work on multiplicative emergence of compositional abilities was accepted to NeurIPS, 2023.
[04/2023] Our work on a mechanistic understanding of loss landscapes was accepted to ICML, 2023.
[01/2023] Our work analyzing loss landscape of self-supervised objectives was accepted to ICLR, 2023.
[10/2021] Our work on dynamics of normalization layers was accepted to NeurIPS, 2021.
[03/2021] Our work on theory of pruning was accepted as a spotlight at ICLR, 2021.
Publications (* denotes equal contribution)
Concept Spaces Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park, Maya Okawa, Andrew Lee, Ekdeep Singh Lubana*, and Hidenori Tanaka*
ICML workshop on High-dimensional Learning Dynamics , 2024
bibtex / arXiv

We analyze a model's learning dynamics in "concept space" and identify sudden transitions where the model, when latently intervened, demonstrates a capability, even if input prompting does not show said capability.

SFT and Jailbreaks What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H.S. Torr, Amartya Sanyal, and Puneet K. Dokania
ICML workshop on Mechanistic Interpretability , 2024 (Spotlight)
bibtex / preprint

We use formal languages as a model system to identify the mechanistic changes induced by safety fine-tuning, and how jailbreaks bypass said mechanisms, verifying our claims on Llama models.

Challenges in LLMs' assurance Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar, Abulhair Saparov*, Javier Rando*, Daniel Paleka*, Miles Turpin*, Peter Hase*, Ekdeep Singh Lubana*, Erik Jenner*, Stephen Casper*, Oliver Sourbut*, Benjamin Edelman*, Zhaowei Zhang*, Mario Gunther*, Anton Korinek*, Jose Hernandez-Orallo*, and others
arXiv preprint, 2024
bibtex / arXiv / website

We identify and discuss 18 foundational challenges in assuring the alignment and safety of large language models (LLMs) and pose 200+ concrete research questions.

Explosion of capabilities Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, and Hidenori Tanaka
International Conference on Machine Learning (ICML), 2024
bibtex / arXiv

We formalize and define a notion of composition of primitive capabilities learned via autoregressive modeling by a Transformer, showing the model's capabilities can "explode", i.e., combinatorially increase if it can compose.

Understanding stepwise inference Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert P. Dick, Ekdeep Singh Lubana*, and Hidenori Tanaka*
International Conference on Machine Learning (ICML), 2024
bibtex / arXiv

We cast stepwise inference methods in LLMs as a graph navigation task, finding a synthetic model is sufficient to explain and identify novel characteristics of such methods.

Mechanistic fine-tuning Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain*, Robert Kirk*, Ekdeep Singh Lubana*, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktaschel, and David Krueger
International Conference on Learning Representations (ICLR), 2024
bibtex / arXiv

We show fine-tuning leads to learning of minimal transformations of a pretrained model's capabilities, like a "wrapper", by using procedural tasks defined using Tracr, PCFGs, and TinyStories.

GPT flips coins In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, and Tomer D. Ullman
International Conference on Learning Representations (ICLR), 2024
bibtex / arXiv

We analyze different LLMs' abilities to model binary sequences generated via different pseduo-random processes, such as a formal automaton, and find that with scale, LLMs are (almost) able to simulate these processes via mere context conditioning.

GPT flips coins FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana, Johann Brehmer, Pim de Haan, and Taco Cohen
NeurIPS workshop on Foundation Models for Decision Making
bibtex / arXiv

We propose and analyze a pipeline for re-casting an LLM as a generic reward function that interacts with an LVM to enable embodied AI tasks.

multiplicative emergence Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa*, Ekdeep Singh Lubana*, Robert P. Dick, and Hidenori Tanaka*
Advances in Neural Information Processing Systems (NeurIPS), 2023
bibtex / arXiv

We analyze compositionality in diffusion models, showing that there is a sudden emergence of this capability if models are allowed sufficient training to learn the relevant primitive capabilities.

ssl_landscape Mechanistic Mode Connectivity
Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, and Hidenori Tanaka
International Conference on Machine Learning (ICML), 2023
bibtex / arXiv / github

We show models that rely on entirely different mechanisms for making their predictions can exhibit mode connectivity, but generally the ones that are mechanistically similar are linearly connected.

ssl_landscape What Shapes the Landscape of Self-Supervised Learning?
Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, and Hidenori Tanaka
International Conference on Learning Representations (ICLR), 2023
bibtex / arXiv

We present a highly detailed analysis of the landscape of several self-supervised learning objectives to clarify the role of representational collapse.

GraphSSL Analyzing Data-Centric Properties for Contrastive Learning on Graphs
Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, and Jay Jayaraman Thiagarajan
Advances in Neural Information Processing Systems (NeurIPS), 2022
bibtex / arXiv / github

We propose a theoretical framework that demonstrates limitations of popular graph augmentation strategies for self-supervised learning.

Orchestra Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering
Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, and Akhil Mathur
International Conference on Machine Learning (ICML), 2022 (Spotlight)
bibtex / arXiv / github / video

We propose an unsupervised learning method that exploits client heterogeneity to enable privacy preserving, SOTA performance unsupervised federated learning.

beyondbn Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning
Ekdeep Singh Lubana, Hidenori Tanaka, and Robert P. Dick
Advances in Neural Information Processing Systems (NeurIPS), 2021
bibtex / github / arXiv / video

We develop a general theory to understand the role of normalization layers in improving training dynamics of a neural network at initialization.

quadreg How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation
Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, and Robert P. Dick
Conference on Lifelong Learning Agents (CoLLAs), 2022
bibtex / github / arXiv / video
(Also presented at ICML Workshop on Theory and Foundations of Continual Learning, 2021)

This work demonstrates how quadratic regularization methods for preventing catastrophic forgetting in deep networks rely on a simple heuristic under-the-hood: Interpolation.

gradflow A Gradient Flow Framework For Analyzing Network Pruning
Ekdeep Singh Lubana and Robert P. Dick
International Conference on Learning Representations (ICLR), 2021 (Spotlight)
bibtex / github / arXiv / video

A unified, theoretically-grounded framework for network pruning that helps justify often used heuristics in the field.

Undergraduate Research
minsip Minimalistic Image Signal Processing for Deep Learning Applications
Ekdeep Singh Lubana, Robert P. Dick, Vinayak Aggarwal, Pyari Mohan Pradhan
International Conference on Image Processing (ICIP), 2019
bibtex /

An image signal processing pipeline that allows use of out-of-the-box deep neural networks on RAW images directly retrieved from image sensors.

Digital Foveation Digital Foveation: An Energy-Aware Machine Vision Framework
Ekdeep Singh Lubana and Robert P. Dick
IEEE Transactions on Computer-Aided Design of Integrated Circuits and System (TCAD), 2018
bibtex /

An energy-efficient machine vision framework inspired by the concept of Fovea in biological vision. Also see follow-up work presented at CVPR workshop, 2020.

SNAP Snap: Chlorophyll Concentration Calculator Using RAW Images of Leaves
Ekdeep Singh Lubana, Mangesh Gurav, and Maryam Shojaei Baghini
IEEE Sensors, 2018; Global Winner, Ericsson Innovation Awards 2017
bibtex / news

An efficient imaging system that accurately calculates chlorophyll content in leaves by using RAW images.

Website template source available here.