Nilesh Gupta

CV | Scholar | Github | Twitter

I am a Pre-Doctoral Research Fellow at Microsoft Research Lab India advised by Dr. Manik Varma, working primarily on algorithms and applications of Extreme Classification. My research interests broadly include Efficient and Large-Scale Machine Learning, Web Search and Recommendation, Graph Neural Networks.

Before Joining MSR, I completed my undergraduate with Honours in Computer Science and Engineering from Indian Institute of Technology Bombay where I was fortunate enough to be advised by Prof. Shivaram Kalyanakrishnan on my Bachelor Thesis.

[NEW] Generalized Zero-Shot Extreme Multi-label Learning
Nilesh Gupta, Sakina Bohra, Yashoteja Prabhu, Saurabh Purohit and Manik Varma
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2021

abstract / bibtex

Extreme Multi-label Learning (XML) involves assigning the subset of most relevant labels to a data point from millions of label choices. A hitherto unaddressed challenge in XML is that of predicting unseen labels with no training points. These form a significant fraction of total labels and contain fresh and personalized information desired by the users. Most existing extreme classifiers are not equipped for zero-shot label prediction and hence fail to leverage unseen labels. As a remedy, this paper proposes a novel approach called ZestXML for the task of Generalized Zero-shot XML (GZXML) where relevant labels have to be chosen from all available seen and unseen labels. ZestXML learns to project a data point’s features close to the features of its relevant labels through a highly sparsified linear transform. This 𝐿0-constrained linear map between the two high-dimensional feature vectors is tractably recovered through a novel optimizer based on Matching Pursuit. By effectively leveraging the sparsities in features, labels, and the learned model, ZestXML achieves higher accuracy and smaller model size than existing XML approaches while also promoting efficient training & prediction, real-time label update as well as explainable prediction. Experiments on large-scale GZXML datasets demonstrated that ZestXML can be up to 14% and 10% more accurate than state-of-the-art extreme classifiers and leading BERT-based dense retrievers respectively while having a 10x smaller model size. ZestXML trains on largest dataset with 31M labels in just ~30 hours on a single core of a commodity desktop. When added to a large ensemble of existing models in Bing Sponsored Search Advertising, ZestXML significantly improved click yield of IR based system by 17% and unseen query coverage by 3.4% respectively. ZestXML’s source code and benchmark datasets for GZXML will be publically released for research purposes.

  author    = {Gupta, Nilesh and Bohra, Sakina and 
    Prabhu, Yashoteja and Purohit, Saurabh 
    and Varma, Manik},
  title     = {Generalized Zero-Shot Extreme 
  Multi-label Learning},
  booktitle = {Proceedings of the ACM SIGKDD 
  International Conference on Knowledge 
  Discovery & Data Mining},
  month     = {August},
  year      = {2021},

Extreme Regression for Dynamic Search Advertising
Yashoteja Prabhu, Aditya Kusupati, Nilesh Gupta and Manik Varma
International Conference on Web Search and Data Mining (WSDM), 2020

Long Oral presentation
abstract / bibtex / pdf / reviews / arXiv / code / poster / XML Repository
Also presented at the Workshop on eXtreme Classification: Theory and Applications @ ICML, 2020

This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently popular extreme classifiers which incorrectly assume strictly binary-valued label relevances. Traditional regression metrics which sum the errors over all the labels are unsuitable for XR problems since they could give extremely loose bounds for the label ranking quality. Also, the existing regression algorithms won't efficiently scale to millions of labels. This paper addresses these limitations through: (1) new evaluation metrics for XR which sum only the k largest regression errors; (2) a new algorithm called XReg which decomposes XR task into a hierarchy of much smaller regression problems thus leading to highly efficient training and prediction. This paper also introduces a (3) new labelwise prediction algorithm in XReg useful for DSA and other recommendation tasks.
Experiments on benchmark datasets demonstrated that XReg can outperform the state-of-the-art extreme classifiers as well as large-scale regressors and rankers by up to 50% reduction in the new XR error metric, and up to 2% and 2.4% improvements in terms of the propensity-scored precision metric used in extreme classification and the click-through rate metric used in DSA respectively. Deployment of XReg on DSA in Bing resulted in a relative gain of 58% in revenue and 27% in query coverage. XReg's source code can be downloaded from

  author    = {Prabhu, Yashoteja and Kusupati, Aditya and 
    Gupta, Nilesh and Varma, Manik},
  title     = {Extreme Regression for Dynamic 
    Search Advertising},
  booktitle = {Proceedings of the ACM International 
    Conference on Web Search and Data Mining},
  month     = {February},
  year      = {2020},
geometric embeddings Learning Complex Behaviours and Keepaway in Robocup 3D
Nilesh Gupta and Shivaram Kalyanakrishnan
Undergraduate Thesis, CSE IIT Bombay, 2018-19

abstract / bibtex / pdf / presentation / demo

The RoboCup 3D Simulated Soccer League allows software agents to control humanoid robots to compete against one another in a realistic simulation. Our work involves developing NEAT based optimization framework for learning high level behaviours of agents in the challenging Robocup 3D environment. One benchmark that we are interested in particular is keepaway. In keepaway agents learn to keep possession of ball in presence of adversaries. Keepaway demands agent to learn long term strategies under large and hidden state space with long and variable delays in effects of actions. Our learnt behaviours consistently outperformed existing hand coded strategies on Keepaway sub task of simulated soccer

  author = {Gupta, Nilesh and Kalyanakrishnan, 
  title = {Learning Complex Behaviours and
    Keepaway in Robocup 3D},
  booktitle = {Undergraduate Thesis, CSE IIT Bombay},
  year = {2018-19},

Template: this, this and this