[PHOTO]

PARAMVEER DHILLON

Assistant Professor
School of Information
Computer Science & Engineering (courtesy)
University of Michigan

Faculty Affiliate
MIDAS (Michigan Institute for Data Science)

Research Affiliate
MIT Sloan School of Management

Office: 3389 North Quad, 105 S. State Street, Ann Arbor, MI 48109
Phone: 734-764-5876
Email: lastname followed by the letter 'p' at umich dot edu
Twitter: @dhillon_p
Google Scholar: https://goo.gl/FEsnE8


Quick Navigation Links

Please follow the links below to navigate to specific subsections of the site or just scroll down to view all the content.

Research Interests   Publications   Professional Background   Teaching   CV   Students   Software  


Research Interests

My research interests span topics in Statistical Machine Learning, Computational Social Science, Natural Language Processing, and Field/Digital Experiments. Substantively, I am interested in understanding the impact of internet technologies on users by empirically studying their interactions with such systems. The research questions that I study are of both predictive as well as causal nature and I examine them by using data from text/natural language and social network domains.


Professional Background

I am an Assistant Professor in the School of Information (SI) and Computer Science & Engineering (courtesy) at the University of Michigan, where I research and teach various topics in Artificial Intelligence (AI), broadly defined.

I got my AM in Statistics and PhD in Computer Science from the University of Pennsylvania where I was advised by Profs. Lyle Ungar, Dean Foster (now at Amazon), and James Gee. During my time at Penn, I also worked closely with Dr. Brian Avants on topics related to Machine Learning in Brain Imaging. My PhD thesis on Advances in Spectral Learning with Applications to Text Analysis and Brain Imaging proposed novel statistical methods for problems in Text Modeling/NLP and Brain Imaging and was awarded the Morris and Dorothy Rubinoff Best Disseration Award. Specifically, it proposed statistically and computationally efficient methods for the problem of learning word embeddings in NLP and for the problem of data-driven parcellation/segmentation of human brain images. Our methods not only gave predictive accuracies that were better or comparable to the state-of-the-art statistical methods (circa 2015) but also had strong theoretical guarantees. Please look at our JMLR 2015 and NeuroImage 2014 papers for more details. I also did other research in my PhD on establishing connections between PCA and ridge regression (cf. JMLR 2013) and on provably faster row and column subsampling algorithms for least squares regression (cf. NeurIPS 2013a,b).

Towards the end of my PhD, I got interested in computational social science and causal inference. After finishing my PhD, I proceeded to complete a Postdoc with Prof. Sinan Aral at MIT. At MIT, I worked on several social science problems (e.g. finding influential individuals in a social network with realistic real-world assumptions (cf. Nature Human Behaviour 2018), devising revenue maximizing price discrimination strategies for newspapers, and designing sequential interventions for news websites to help them maintain sustained user engagement). At MIT, I was also involved with the Initiative on the Digital Economy on studying the economic and societal impacts of AI.

Much before all this, I was a carefee undergrad studying Electronics & Electrical Communication Engineering at PEC in my hometown of Chandigarh, India. I developed my interest in AI/ML and the desire to pursue a PhD as a result of three memorable summer internships, prior to my PhD, at Computer Vision Center @ Barcelona [summer 2006], Max Planck Institute for Intelligent Systems @ Tuebingen [summer 2008], and Information Sciences Institute/USC @ Los Angeles [summer 2009].


Teaching

  1. Fall 2019: SI 671/721 Data Mining: Methods and Applications.


Students

I will be admitting 1 or 2 new Ph.D students in the 2019-20 academic year. Completed Ph.D applications are due in early/mid December 2019. Please follow this link to apply and mention my name as a potential advisor, if you're interested in working with me. Please note that I am unable to reply to individual emails sent to me to assess your chances of admission.

I am broadly looking to supervise students who are interested in working on Statistical Machine Learning, Deep Learning, NLP, Data Mining, Computational Social Science, or Information Economics. Prior research experience in these areas is highly valued as are strong programming skills and a solid applied math/statistics background.


Publications

Acronyms for conferences and journals wherever applicable:

[Statistical Machine Learning/AI venues] JMLR: Journal of Machine Learning Research; NeurIPS: Advances in Neural Information Processing Systems Conference; ICML: International Conference on Machine Learning; AISTATS: International Conference on Artificial Intelligence and Statistics; ECML: European Conference on Machine Learning.

[NLP/CL venues] EMNLP: International Conference on Empirical Methods in Natural Language Processing; ACL: Annual Conference of the Association for Computational Linguistics; COLING: International Conference on Computational Linguistics.

[Data Mining/Information Management venues] ICDM: International Conference on Data Mining; CIKM: International Conference on Information and Knowledge Management.

[(Medical, Neuro) Imaging venues] ISBI: IEEE International Symposium on Biomedical Imaging; MICCAI: International Conference on Medical Image Computing and Computer Assisted Intervention.

Note: The list below only contains the published papers. I do not list the various {preprints, working papers, papers under review} below for a variety of reasons. Please get in touch with me if you're interested in seeing them.

*indicates alphabetical author listing.

  1. Social Influence Maximization under Empirical Influence Models.*
    Sinan Aral, Paramveer Dhillon.
    Nature Human Behaviour, May 2018.
    [PDF] [Supplementary Information]
  2. Eigenwords: Spectral Word Embeddings.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    JMLR, December 2015.
    [PDF] [Code + Pre-trained Embeddings]
  3. Subject-Specific Functional Parcellation via Prior Based Eigenanatomy.
    Paramveer Dhillon, David Wolk, Sandhitsu Das, Lyle Ungar, James Gee, and Brian Avants.
    NeuroImage, October 2014.
    [PDF] [Code]
  4. New Subsampling Algorithms for Fast Least Squares Regression.
    Paramveer Dhillon, Yichao Lu, Dean Foster, and Lyle Ungar.
    NeurIPS 2013.
    [PDF] [Supplementary Information]
  5. Faster Ridge Regression via the Subsampled Randomized Hadamard Transform.
    Yichao Lu, Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    NeurIPS 2013.
    [PDF] [Supplementary Information]
  6. A Risk Comparison of Ordinary Least Squares vs Ridge Regression.
    Paramveer Dhillon, Dean Foster, Sham Kakade, and Lyle Ungar.
    JMLR, May 2013.
    [PDF]
  7. Two Step CCA: A new spectral method for estimating vector models of words.
    Paramveer Dhillon, Jordan Rodu, Dean Foster, and Lyle Ungar.
    ICML 2012.
    [PDF] [Supplementary Information] [Code + Pre-trained Embeddings] [Note: This paper was superseded by our JMLR 2015 paper.]
  8. Spectral Dependency Parsing with Latent Variables.
    Paramveer Dhillon, Jordan Rodu, Michael Collins, Dean Foster, and Lyle Ungar.
    EMNLP 2012.
    [PDF]
  9. Partial Sparse Canonical Correlation Analysis (PSCCA) for population studies in Medical Imaging.
    Paramveer Dhillon, Brian Avants, Lyle Ungar, and James Gee.
    ISBI 2012.
    [PDF]
  10. Eigenanatomy improves detection power for longitudinal cortical change.
    Brian Avants, Paramveer Dhillon, Benjamin Kandel, Philip Cook, Corey McMillan, Murray Grossman, and James Gee.
    MICCAI 2012.
    [PDF]
  11. Deterministic Annealing for Semi-Supervised Structured Output Learning.
    Paramveer Dhillon, Sathiya Keerthi, Olivier Chapelle, Kedar Bellare, and S. Sundararajan.
    AISTATS 2012.
    [PDF]
  12. Metric Learning for Graph-based Domain Adaptation.
    Paramveer Dhillon, Partha Talukdar, and Koby Crammer.
    COLING 2012.
    [PDF]
  13. Multi-View Learning of Word Embeddings via CCA.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    NeurIPS 2011.
    [PDF] [Supplementary Information] [Code + Pre-trained Embeddings] [Note: This paper was superseded by our JMLR 2015 paper.]
  14. Minimum Description Length Penalization for Group and Multi-Task Sparse Learning.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    JMLR, February 2011.
    [PDF]
  15. Semi-supervised Multi-task Learning of Structured Prediction Models for Web Information Extraction.
    Paramveer Dhillon, S. Sundararajan, and S. Sathiya Keerthi.
    CIKM 2011.
    [PDF]
  16. A New Approach to Lexical Disambiguation of Arabic Text.
    Rushin Shah, Paramveer Dhillon, Mark Liberman, Dean Foster, Mohamed Maamouri, and Lyle Ungar.
    EMNLP 2010.
    [PDF]
  17. Learning Better Data Representation using Inference-Driven Metric Learning (IDML).
    Paramveer Dhillon, Partha Pratim Talukdar, and Koby Crammer.
    ACL 2010.
    [PDF]
  18. Feature Selection using Multiple Streams.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    AISTATS 2010.
    [PDF]
  19. Transfer Learning, Feature Selection and Word Sense Disambiguation.
    Paramveer Dhillon, Lyle Ungar.
    ACL 2009.
    [PDF]
  20. Multi-Task Feature Selection using the Multiple Inclusion Criterion (MIC).
    Paramveer Dhillon, Brian Tomasik, Dean Foster, and Lyle Ungar.
    ECML 2009.
    [PDF]
  21. Efficient Feature Selection in the Presence of Multiple Feature Classes.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    ICDM 2008.
    [PDF]
  22.                                   Workshop Papers (Venues for getting initial feedback on research. Often do not have proceedings.)

    Acronyms for workshops include:

    NBER-SI: National Bureau of Economic Research - Summer Institute; CODE: MIT Conference on Digital Experimentation; WISE: Workshop on Information Systems and Economics; WIN: Workshop on Information in Networks; NSF-ITN: NSF Conference on Information Transmission in Networks at Harvard University; WCBA: Utah Winter Conference on Business Analytics; PRNI: International Workshop on Pattern Recognition in Neuroimaging; SSDBM: Scientific and Statistical Database Management Conference; NESCAI: North East Student Colloquium on Artificial Intelligence; ViSU/CVPR: Visual Scene Understanding Workshop at CVPR; CISIS/LNCS: Computational Intelligence in Security for Information Systems Conference/Lecture Notes in Computer Science.

  23. Digital Paywall Design: Implications for Content Demand and Subscriptions.
    Sinan Aral, Paramveer Dhillon.
    NBER-SI (Economics of Digitization) 2017.
    [Abstract + Talk Only]
  24. Digital Paywall Design: Implications for Content Demand and Subscriptions.
    Sinan Aral, Paramveer Dhillon.
    CODE 2016.
    [Abstract + Talk Only]
  25. Digital Paywall Design: Implications for Content Demand and Subscriptions.
    Sinan Aral, Paramveer Dhillon.
    WISE 2016. [Runner-up best paper award]
    [Abstract + Talk Only]
  26. Digital Paywall Design: Implications for Content Demand and Subscriptions.
    Sinan Aral, Paramveer Dhillon.
    WCBA 2016.
    [Abstract + Talk Only]
  27. Influence Maximization Revisited.
    Sinan Aral, Paramveer Dhillon.
    WIN 2015.
    [Abstract + Talk Only]
  28. Influence Maximization Revisited.
    Sinan Aral, Paramveer Dhillon.
    NSF-ITN 2015.
    [Abstract + Talk Only]
  29. Anatomically-Constrained PCA for Image Parcellation.
    Paramveer Dhillon, James Gee, Lyle Ungar, and Brian Avants.
    PRNI 2013.
    [PDF]
  30. Learning to Explore Scientific Workflow Repositories.
    Julia Stoyanovich, Paramveer Dhillon, Brian Lyons, and Susan Davidson.
    SSDBM 2013.
    [PDF]
  31. Inference Driven Metric Learning for Graph Construction.
    Paramveer Dhillon, Partha Pratim Talukdar, and Koby Crammer.
    NESCAI 2010.
    [PDF]
  32. Combining Appearance and Motion for Human Action Classification in Videos.
    Paramveer Dhillon, Sebastian Nowozin, and Christoph Lampert.
    ViSU/CVPR 2009.
    [PDF]
  33. Robust Real-Time Face Tracking Using an Active Camera.
    Paramveer Dhillon
    CISIS/LNCS 2009.
    [PDF]

Software

  1. Code and data for our Nature Human Behaviour 2018 paper is available here.
  2. The ANTsR toolkit for medical image analysis (including the implementation of our NeuroImage 2014 paper) is available here.
  3. The SWELL (Spectral Word Embedding Learning for Language) JAVA toolkit for inducing word embeddings (cf. JMLR 2015, ICML 2012, NeurIPS 2011) is available here.
  4. Various Eigenword (SWELL) embeddings for reproducing the results in our JMLR 2015 paper can be found below [No additional scaling required for embeddings. Use them as is]. [Based on our results, OSCCA and TSCCA embeddings are the most robust and work best on a variety of tasks.]
  5. Generic eigenwords embeddings for various languages [Trained on much larger corpora.]


Last Modified: August 21, 2019