Ekdeep Singh Lubana

I am a postdoc at Harvard University as part of the CBS-NTT Program on Physics of Intelligence. I did my PhD co-affiliated with EECS, University of Michigan and CBS, Harvard, and was advised by Robert Dick and Hidenori Tanaka.

I am generally interested in designing (faithful) abstractions of phenomena relevant to controlling or aligning neural networks. I am also very interested in better understanding training dynamics of neural networks, especially via a statistical physics perspective.

I graduated with a Bachelor's degree in ECE from Indian Institute of Technology (IIT), Roorkee in 2019. My research in undergraduate was primarily focused on embedded systems, such as energy-efficient machine vision systems.

Email  /  CV  /  Google Scholar  /  Github

profile photo
News
[09/2024] Paper on hidden capabilities in generative models accepted as a spotlight at NeurIPS, 2024.
[08/2024] Preprint on a percolation model of emergent capabilities is on arXiv now.
[06/2024] Paper on identifying how jailbreaks bypass safety mechanisms accepted at NeurIPS 2024.
[11/2023] Paper on mechanistically analyzing effects of fine-tuning accepted to ICLR, 2024.
[10/2023] Paper on analyzing in-context learning as a subjective randomness task accepted to ICLR, 2024.
[10/2023] Our work on multiplicative emergence of compositional abilities was accepted to NeurIPS, 2023.
[04/2023] Our work on a mechanistic understanding of loss landscapes was accepted to ICML, 2023.
[01/2023] Our work analyzing loss landscape of self-supervised objectives was accepted to ICLR, 2023.
[10/2021] Our work on dynamics of normalization layers was accepted to NeurIPS, 2021.
[03/2021] Our work on theory of pruning was accepted as a spotlight at ICLR, 2021.
Publications (* denotes equal contribution)
SAEs and Formal Languages Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon, Manish Srivastava, David Krueger, and Ekdeep Singh Lubana
NeurIPS workshop on Foundation Model Interventions , 2024 (Spotlight)

We use Formal languages to analyze the limitations of SAEs, finding, similar to prior work in disentangled representation learning, that SAEs find correlational features; explicit biasing is necessary to induce causality.

Representation Shattering Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
Kento Nishi, Maya Okawa, Rahul Ramesh, Mikail Khona, Ekdeep Singh Lubana*, and Hidenori Tanaka*
Preprint, 2024

We instantiate a synthetic knowledge graph domain to study how model editing protocols harm broader capabilities, demonstrating all representational organization of different concepts is destroyed under counterfactul edits.

Concept learning Dynamics of Concept Learning and Compositional Generalization
Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, and Hidenori Tanaka
Preprint, 2024

We create a theoretical abstraction of our prior work on compositional generalization and justify the peculiar learning dynamics observed therein, finding there was in fact a quadruple descent embedded therein!

Percolation Model of Emergence A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language
Ekdeep Singh Lubana*, Kyogo Kawaguchi*, Robert P. Dick, and Hidenori Tanaka
Preprint, 2024

We implicate rapid acquisition of structures underlying the data generating process as the source of sudden learning of capabilities, and analogize knowledge centric capabilities to the process of graph percolation that undergoes a formal second-order phase transition.

Concept Spaces Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park, Maya Okawa, Andrew Lee, Ekdeep Singh Lubana*, and Hidenori Tanaka*
Advances in Neural Information Processing Systems (NeurIPS), 2024 (Spotlight)

We analyze a model's learning dynamics in "concept space" and identify sudden transitions where the model, when latently intervened, demonstrates a capability, even if input prompting does not show said capability.

SFT and Jailbreaks What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain, Ekdeep Singh Lubana, Kemal Oksuz, Tom Joy, Philip H.S. Torr, Amartya Sanyal, and Puneet K. Dokania
Advances in Neural Information Processing Systems (NeurIPS), 2024
ICML workshop on Mechanistic Interpretability , 2024 (Spotlight)

We use formal languages as a model system to identify the mechanistic changes induced by safety fine-tuning, and how jailbreaks bypass said mechanisms, verifying our claims on Llama models.

Structure acquisition Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani, Ekdeep Singh Lubana, and Wei Hu
Advances in Neural Information Processing Systems (NeurIPS), 2024

We show the acquisition of structures underlying a data-generating process is the driving cause for abrupt learning in Transformers.

Challenges in LLMs' assurance Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar, Abulhair Saparov*, Javier Rando*, Daniel Paleka*, Miles Turpin*, Peter Hase*, Ekdeep Singh Lubana*, Erik Jenner*, Stephen Casper*, Oliver Sourbut*, Benjamin Edelman*, Zhaowei Zhang*, Mario Gunther*, Anton Korinek*, Jose Hernandez-Orallo*, and others
Transactions on Machine Learning Research (TMLR) , 2024

We identify and discuss 18 foundational challenges in assuring the alignment and safety of large language models (LLMs) and pose 200+ concrete research questions.

Explosion of capabilities Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, and Hidenori Tanaka
International Conference on Machine Learning (ICML), 2024

We formalize and define a notion of composition of primitive capabilities learned via autoregressive modeling by a Transformer, showing the model's capabilities can "explode", i.e., combinatorially increase if it can compose.

Understanding stepwise inference Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model
Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert P. Dick, Ekdeep Singh Lubana*, and Hidenori Tanaka*
International Conference on Machine Learning (ICML), 2024

We cast stepwise inference methods in LLMs as a graph navigation task, finding a synthetic model is sufficient to explain and identify novel characteristics of such methods.

Mechanistic fine-tuning Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain*, Robert Kirk*, Ekdeep Singh Lubana*, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktaschel, and David Krueger
International Conference on Learning Representations (ICLR), 2024

We show fine-tuning leads to learning of minimal transformations of a pretrained model's capabilities, like a "wrapper", by using procedural tasks defined using Tracr, PCFGs, and TinyStories.

GPT flips coins In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, and Tomer D. Ullman
International Conference on Learning Representations (ICLR), 2024

We analyze different LLMs' abilities to model binary sequences generated via different pseduo-random processes, such as a formal automaton, and find that with scale, LLMs are (almost) able to simulate these processes via mere context conditioning.

GPT flips coins FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana, Johann Brehmer, Pim de Haan, and Taco Cohen
NeurIPS workshop on Foundation Models for Decision Making

We propose and analyze a pipeline for re-casting an LLM as a generic reward function that interacts with an LVM to enable embodied AI tasks.

multiplicative emergence Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa*, Ekdeep Singh Lubana*, Robert P. Dick, and Hidenori Tanaka*
Advances in Neural Information Processing Systems (NeurIPS), 2023

We analyze compositionality in diffusion models, showing that there is a sudden emergence of this capability if models are allowed sufficient training to learn the relevant primitive capabilities.

ssl_landscape Mechanistic Mode Connectivity
Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, and Hidenori Tanaka
International Conference on Machine Learning (ICML), 2023

We show models that rely on entirely different mechanisms for making their predictions can exhibit mode connectivity, but generally the ones that are mechanistically similar are linearly connected.

ssl_landscape What Shapes the Landscape of Self-Supervised Learning?
Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, and Hidenori Tanaka
International Conference on Learning Representations (ICLR), 2023

We present a highly detailed analysis of the landscape of several self-supervised learning objectives to clarify the role of representational collapse.

GraphSSL Analyzing Data-Centric Properties for Contrastive Learning on Graphs
Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, and Jay Jayaraman Thiagarajan
Advances in Neural Information Processing Systems (NeurIPS), 2022

We propose a theoretical framework that demonstrates limitations of popular graph augmentation strategies for self-supervised learning.

Orchestra Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering
Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, and Akhil Mathur
International Conference on Machine Learning (ICML), 2022 (Spotlight)

We propose an unsupervised learning method that exploits client heterogeneity to enable privacy preserving, SOTA performance unsupervised federated learning.

beyondbn Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning
Ekdeep Singh Lubana, Hidenori Tanaka, and Robert P. Dick
Advances in Neural Information Processing Systems (NeurIPS), 2021

We develop a general theory to understand the role of normalization layers in improving training dynamics of a neural network at initialization.

quadreg How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation
Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, and Robert P. Dick
Conference on Lifelong Learning Agents (CoLLAs), 2022

(Also presented at ICML Workshop on Theory and Foundations of Continual Learning, 2021)

This work demonstrates how quadratic regularization methods for preventing catastrophic forgetting in deep networks rely on a simple heuristic under-the-hood: Interpolation.

gradflow A Gradient Flow Framework For Analyzing Network Pruning
Ekdeep Singh Lubana and Robert P. Dick
International Conference on Learning Representations (ICLR), 2021 (Spotlight)

A unified, theoretically-grounded framework for network pruning that helps justify often used heuristics in the field.

Undergraduate Research
minsip Minimalistic Image Signal Processing for Deep Learning Applications
Ekdeep Singh Lubana, Robert P. Dick, Vinayak Aggarwal, Pyari Mohan Pradhan
International Conference on Image Processing (ICIP), 2019

An image signal processing pipeline that allows use of out-of-the-box deep neural networks on RAW images directly retrieved from image sensors.

Digital Foveation Digital Foveation: An Energy-Aware Machine Vision Framework
Ekdeep Singh Lubana and Robert P. Dick
IEEE Transactions on Computer-Aided Design of Integrated Circuits and System (TCAD), 2018

An energy-efficient machine vision framework inspired by the concept of Fovea in biological vision. Also see follow-up work presented at CVPR workshop, 2020.

SNAP Snap: Chlorophyll Concentration Calculator Using RAW Images of Leaves
Ekdeep Singh Lubana, Mangesh Gurav, and Maryam Shojaei Baghini
IEEE Sensors, 2018; Global Winner, Ericsson Innovation Awards 2017

An efficient imaging system that accurately calculates chlorophyll content in leaves by using RAW images.


Website template source available here.