Assistant Professor
School of Information
University of Michigan

Affiliate Faculty
Michigan Institute for Data Science (MIDAS)
E-Health & AI Initiative (e-HAIL)
Michigan Precision Health

Digital Fellow
MIT Initiative on the Digital Economy

Office: 3389 North Quad, 105 S. State Street, Ann Arbor, MI 48109
Phone: 734-764-5876
Email: lastname followed by the letter 'p' at umich dot edu
Twitter: @dhillon_p
Google Scholar: https://goo.gl/FEsnE8

Quick Navigation Links

Please follow the links below to navigate to specific subsections of the site or just scroll down to view all the content.

Research Interests   Publications   Professional Background   Teaching   Awards   Research Group   Service   Software  

Research Interests

My research centers around studying People & Technology by developing new Machine Learning & Data Science Methods. I am focused on developing such methods to model text-based online interaction patterns to understand how Internet technologies impact our lives and economy. In current and past research, I have analyzed data from several online platforms, including The New York Times, Boston Globe, YouTube, Reddit, & Twitter.

In addition to the substantive research focus on People & Technology, some of my research projects are solely motivated by challenging methodological scenarios, e.g., those encountered in applications where the data is sparse and high-dimensional or the causal treatment/outcome is high-dimensional. Occasionally, I examine such text-based research problems from Health/Medicine, Education, and Environmental Science domains. To overcome these challenges, my research methodology draws on Text Mining, Deep Learning, and Causal Inference/Applied Econometrics.

In summary, my research aims to advance knowledge and inform policy at the intersection of data science, technology, and society. In terms of academic disciplines, my research straddles the fields of Information Systems, Information Science, & Computer Science.

Key currently active projects include:

  1. Effective Human-AI collaboration strategies for content co-creation. [cf. NAACL '24, CHI '24]
  2. LLM Unlearning: Developing principled techniques for making LLM forget facts.
  3. Long-term effects of Recommender Systems. [cf. WWW '24]
  4. Quantifying the impact of newspaper paywalls on the shift in topical news content production.

Professional Background

Starting Fall 2019, I am an Assistant Professor in the School of Information (SI) at the University of Michigan, where I research and teach various topics in Artificial Intelligence (AI), broadly defined.

I got my A.M. in Statistics and M.S.E. & Ph.D. in Computer Science from the University of Pennsylvania where I was advised by Profs. Lyle Ungar, Dean Foster (now at Amazon), and James Gee. During my time at Penn, I also worked closely with Dr. Brian Avants on topics related to Machine Learning in Brain Imaging. My Ph.D. thesis was entitled, "Advances in Spectral Learning with Applications to Text Analysis and Brain Imaging," and won the Best Computer Science Dissertation Award at Penn (Morris and Dorothy Rubinoff Award). It proposed novel statistical methods for problems in Text Modeling/NLP and Brain Imaging. More specifically, my doctoral dissertation proposed statistically and computationally efficient methods for the problem of learning word embeddings in NLP and for the problem of data-driven parcellation/segmentation of human brain images. Our methods not only gave predictive accuracies that were better or comparable to the state-of-the-art statistical methods (circa 2015) but also had strong theoretical guarantees. Please look at our JMLR 2015 and NeuroImage 2014 papers for more details. I also did other research in my Ph.D. on establishing connections between PCA and ridge regression (cf. JMLR 2013) and on provably faster row and column subsampling algorithms for least squares regression (cf. NeurIPS 2013a,b).

Towards the end of my Ph.D., I got interested in computational social science and causal inference. After finishing my Ph.D., I proceeded to complete a Postdoc with Prof. Sinan Aral at MIT. At MIT, I worked on several social science problems, e.g., finding influential individuals in a social network with realistic real-world assumptions (cf. Nature Human Behaviour 2018), devising revenue maximizing price discrimination strategies for newspapers (cf. Management Science 2020), and designing sequential interventions for news websites to help them maintain sustained user engagement (cf. Management Science 2022). At MIT, I was also involved with the Initiative on the Digital Economy (IDE) on studying the economic and societal impacts of AI. I am still affiliated with IDE as a Digital Fellow.

Much before all this, I was a carefree undergrad studying Electronics & Electrical Communication Engineering at PEC in my hometown of Chandigarh, India. I developed my interest in AI/ML and the desire to pursue a Ph.D. as a result of three memorable summer internships, before my Ph.D., at Computer Vision Center @ Barcelona [summer 2006], Max Planck Institute for Intelligent Systems @ Tuebingen [summer 2008], and Information Sciences Institute/USC @ Los Angeles [summer 2009].


  1. SI 671/721 Data Mining: Methods and Applications [Significantly Re-designed] @ F[19,20,21,22,23].
  2. SIADS 642 [online] Introduction to Deep Learning [Developed from scratch] @ F20-present.
  3. SIADS 532 [online] Data Mining I @ W21-present.
  4. SIADS 632 [online] Data Mining II @ F21-present.

Research Group

Ph.D. Students

  1. Yachuan Liu [F20-] Last Stop: BS @ UC Berkeley.
  2. Sanzeed Anwar [F21-] Last Stop: BS+MEng @ MIT.
  3. Bohan Zhang [F22-] Last Stop: MS @ University of Michigan.

Undergrad/Masters Students

  1. Jiyu Chen [Masters]
  2. Ronith Ganjigunta [Undergrad]
  3. Max Golub [Undergrad]
  4. Yixuan Jiang [Masters]

Former Students

  1. Xinyue Li[MS '23, next Ph.D. in Statistics at Boston University]
  2. Ella Li[MS '23, next Ph.D. in CS at Northeastern University]
  3. Siqi Ma [BS '23, next MS in Statistics at Stanford University]
  4. Shaochun Zheng [BS '23, next MS in CS at UC San Diego]
  5. Houming Chen [BS '23, next Ph.D. in CS at University of Michigan]
  6. Yushi She [BS '23, next MS in CS at Georgia Tech]
  7. Ted Yuan [BS '23, next MS in ECE at Carnegie Mellon University]
  8. Evan Weissburg [BS '23, next Software Engineer at Jane Street Capital]
  9. Arya Kumar [BS '23, next Software Engineer at Jane Street Capital]
  10. Jupiter Zhu [BS '22, next MS in CS at Stanford University]
  11. Tianyi Li [BS '22, next MS in INI at Carnegie Mellon University]
  12. Xianglong Li [BS '22, next MS in CS at Yale University]
  13. Florence Wu [BS '22, next MS in CS at Harvard University]
  14. Yingzhuo Yu [BS '22, next MS in CS at UIUC]
  15. Xingjian Zhang [BS '22, next Ph.D. in Information at UMSI]
  16. Bohan Zhang [MS '22, next Ph.D. in Information at UMSI]
  17. Zhengyang Shan [MS '22, next Ph.D. in CDS at Boston University]
  18. Jiapeng Guo [BS '21, next MS in CS at Columbia University]
  19. Zilu Wang [BS '21, next MS in MS&E at Stanford University]

I always have openings for strong students in my group at all levels (Postdoctoral, Ph.D, Masters, or Undergrad). I am broadly looking to supervise students who are interested in working on ML, Information Systems, or NLP. Prior research experience in these areas is highly valued, as are strong programming skills and a solid applied math/statistics background.

Process: Masters/Undergrads (already at University of Michigan) interested in working with me can email their CV and transcripts. Prospective Postdocs can directly email me their latest CV and Research Statement. Prospective Ph.D. students need not email me directly but are encouraged to apply to our Ph.D. program here and mention my name as a potential advisor. The deadline is December 1 each year.


  1. INFORMS Information Systems Society (ISS) Gordon B. Davis Young Scholar Award, 2021.
  2. INFORMS Annual Conference (Best Paper Award), 2020.
  3. Workshop on Information Systems and Economics (WISE) (Runner-up Best Paper Award), 2016.
  4. Rubinoff Best Doctoral Dissertation Award (awarded by Penn CIS), 2015.

Service to the Profession

  1. Editorial Board JMLR [2020-].
  2. Ad-hoc Reviewer: Nature, Nature Human Behaviour, Nature Communications, PNAS, JAIR, Information Science Research (ISR), Management Science, Marketing Science, IEEE TKDE, IEEE TPAMI.
  3. Reviewer/PC/SPC Member @ Core AI/ML Conferences: [every year since 2013] NeurIPS, ICML, AISTATS, ICLR, AAAI, IJCAI.
  4. Reviewer/PC/SPC Member @ Core Information Systems Conferences: [every year since 2017] ICIS, CIST, WISE.
  5. Reviewer/PC/SPC Member @ Core NLP/Computational Social Science Conferences: [sporadically] EMNLP, NAACL, ICWSM, IC2S2.

Selected Publications

Below is a list of selected publications that highlight my core research interests and contributions. A complete list of all my publications is available here.

*indicates alphabetical author listing.

Publications (Full List)

Acronyms for conferences and journals wherever applicable:

[General Science venues] PNAS: Proceedings of the National Academy of Sciences.

[Statistical Machine Learning/AI/Data Mining venues] JMLR: Journal of Machine Learning Research; NeurIPS: Advances in Neural Information Processing Systems Conference; ICML: International Conference on Machine Learning; AISTATS: International Conference on Artificial Intelligence and Statistics; ECML: European Conference on Machine Learning; ICDM: International Conference on Data Mining.

[NLP/CL/HCI venues] NAACL: Annual Conference of the North American Association for Computational Linguistics; EMNLP: International Conference on Empirical Methods in Natural Language Processing; ACL: Annual Conference of the Association for Computational Linguistics; COLING: International Conference on Computational Linguistics; CHI: SIGCHI Conference on Human Factors in Computing Systems.

[Social Media/Web/Computational Social Science/Information Management venues] WWW: The Web Conference; ICWSM: International Conference on Web and Social Media; CIKM: International Conference on Information and Knowledge Management; SocInfo: International Conference on Social Informatics.

[(Medical, Neuro) Imaging venues] ISBI: IEEE International Symposium on Biomedical Imaging; MICCAI: International Conference on Medical Image Computing and Computer Assisted Intervention.

*indicates alphabetical author listing.

  1. Causal Inference for Human-Language Model Collaboration. new
    Bohan Zhang, Yixin Wang, and Paramveer Dhillon.
    NAACL(Main Conference), 2024.
  2. Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models. new
    Paramveer Dhillon, Somayeh Molaei, Jiaqi Li, Maximilian Golub, Shaochun Zheng, and Lionel Robert.
    CHI, 2024.
  3. Filter Bubble or Homogenization? Disentangling the Long-Term Effects of Recommendations on User Consumption Patterns. new
    Sanzeed Anwar, Grant Schoenebeck, and Paramveer Dhillon.
    WWW, 2024.
  4. PM2.5 forecasting under distribution shift: A graph learning approach new
    Yachuan Liu, Jiaqi Ma, Paramveer Dhillon, and Qiaozhu Mei.
    AI Open, 2023.
  5. Targeting for long-term outcomes. new
    Jeremy Yang, Dean Eckles, Paramveer Dhillon, and Sinan Aral.
    Management Science, 2023.
  6. Unique in what sense? Heterogeneous relationships between multiple types of uniqueness and popularity in music.
    Yulin Yu, Pui Yin Cheung, Yong-Yeol Ahn, and Paramveer Dhillon.
    ICWSM, 2023.
  7. Unpacking Gender Stereotypes in Film Dialogue.
    Yulin Yu, Yucong Hao, and Paramveer Dhillon.
    SocInfo, 2022.
  8. Judging a Book by Its Cover: Predicting the Marginal Impact of Title on Reddit Post Popularity.
    Evan Weissburg, Arya Kumar, and Paramveer Dhillon.
    ICWSM, 2022.
  9. What (Exactly) is Novelty in Networks? Unpacking the Vision Advantages of Brokers, Bridges, and Weak Ties.
    Sinan Aral, Paramveer Dhillon.
    Management Science, 2022.
  10. Detecting Struggling Students From Interactive Ebook Data: A Case Study Using CSAwesome.
    Barbara Ericson, Hisamitsu Maeda, and Paramveer Dhillon.
    SIGCSE Symposium, 2022.
  11. Social Status and Novelty Drove the Spread of Online Information During the Early Stages of COVID-19. Antonis Photiou, Christos Nicolaides, and Paramveer Dhillon.
    Nature Scientific Reports, 2021.
    [PDF] [Supplementary Information]
  12. Modeling Dynamic User Interests: A Neural Matrix Factorization Approach.
    Paramveer Dhillon, Sinan Aral.
    Marketing Science, 2021.
  13. Interdependence and the Cost of Uncoordinated Responses to COVID-19.
    David Holtz, Michael Zhao, Seth Benzell, Cathy Cao, Amin Rahimian, Jeremy Yang, Jennifer Allen, Avinash Collis, Alex Moehring, Tara Sowrirajan, Dipayan Ghosh, Yunhao Zhang, Paramveer Dhillon, Christos Nicolaides, Dean Eckles, and Sinan Aral.
    PNAS, 2020.
    [PDF] [Supplementary Information]
  14. Press Coverage: [Michigan News] [OneDetroit PBS Interview (Starts at 14:40)] [Los Angeles Times] [The Washington Post] [MSNBC] [The Boston Globe] [Yahoo Finance] [The Hill] [TechRepublic] [WGBH]

  15. Digital Paywall Design: Implications for Content Demand & Subscriptions.*
    Sinan Aral, Paramveer Dhillon.
    Management Science, 2020.
  16. Press Coverage: [Michigan News]

  17. Social Influence Maximization under Empirical Influence Models.*
    Sinan Aral, Paramveer Dhillon.
    Nature Human Behaviour, May 2018.
    [PDF] [Supplementary Information]
  18. Eigenwords: Spectral Word Embeddings.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    JMLR, December 2015.
    [PDF] [Code + Pre-trained Embeddings]
  19. Subject-Specific Functional Parcellation via Prior Based Eigenanatomy.
    Paramveer Dhillon, David Wolk, Sandhitsu Das, Lyle Ungar, James Gee, and Brian Avants.
    NeuroImage, October 2014.
    [PDF] [Code]
  20. New Subsampling Algorithms for Fast Least Squares Regression.
    Paramveer Dhillon, Yichao Lu, Dean Foster, and Lyle Ungar.
    NeurIPS 2013.
    [PDF] [Supplementary Information]
  21. Faster Ridge Regression via the Subsampled Randomized Hadamard Transform.
    Yichao Lu, Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    NeurIPS 2013.
    [PDF] [Supplementary Information]
  22. A Risk Comparison of Ordinary Least Squares vs Ridge Regression.
    Paramveer Dhillon, Dean Foster, Sham Kakade, and Lyle Ungar.
    JMLR, May 2013.
  23. Two Step CCA: A new spectral method for estimating vector models of words.
    Paramveer Dhillon, Jordan Rodu, Dean Foster, and Lyle Ungar.
    ICML 2012.
    [PDF] [Supplementary Information] [Code + Pre-trained Embeddings] [Note: This paper was superseded by our JMLR 2015 paper.]
  24. Spectral Dependency Parsing with Latent Variables.
    Paramveer Dhillon, Jordan Rodu, Michael Collins, Dean Foster, and Lyle Ungar.
    EMNLP 2012.
  25. Partial Sparse Canonical Correlation Analysis (PSCCA) for population studies in Medical Imaging.
    Paramveer Dhillon, Brian Avants, Lyle Ungar, and James Gee.
    ISBI 2012.
  26. Eigenanatomy improves detection power for longitudinal cortical change.
    Brian Avants, Paramveer Dhillon, Benjamin Kandel, Philip Cook, Corey McMillan, Murray Grossman, and James Gee.
    MICCAI 2012.
  27. Deterministic Annealing for Semi-Supervised Structured Output Learning.
    Paramveer Dhillon, Sathiya Keerthi, Olivier Chapelle, Kedar Bellare, and S. Sundararajan.
    AISTATS 2012.
  28. Metric Learning for Graph-based Domain Adaptation.
    Paramveer Dhillon, Partha Talukdar, and Koby Crammer.
    COLING 2012.
  29. Multi-View Learning of Word Embeddings via CCA.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    NeurIPS 2011.
    [PDF] [Supplementary Information] [Code + Pre-trained Embeddings] [Note: This paper was superseded by our JMLR 2015 paper.]
  30. Minimum Description Length Penalization for Group and Multi-Task Sparse Learning.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    JMLR, February 2011.
  31. Semi-supervised Multi-task Learning of Structured Prediction Models for Web Information Extraction.
    Paramveer Dhillon, S. Sundararajan, and S. Sathiya Keerthi.
    CIKM 2011.
  32. A New Approach to Lexical Disambiguation of Arabic Text.
    Rushin Shah, Paramveer Dhillon, Mark Liberman, Dean Foster, Mohamed Maamouri, and Lyle Ungar.
    EMNLP 2010.
  33. Learning Better Data Representation using Inference-Driven Metric Learning (IDML).
    Paramveer Dhillon, Partha Pratim Talukdar, and Koby Crammer.
    ACL 2010.
  34. Feature Selection using Multiple Streams.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    AISTATS 2010.
  35. Transfer Learning, Feature Selection and Word Sense Disambiguation.
    Paramveer Dhillon, Lyle Ungar.
    ACL 2009.
  36. Multi-Task Feature Selection using the Multiple Inclusion Criterion (MIC).
    Paramveer Dhillon, Brian Tomasik, Dean Foster, and Lyle Ungar.
    ECML 2009.
  37. Efficient Feature Selection in the Presence of Multiple Feature Classes.
    Paramveer Dhillon, Dean Foster, and Lyle Ungar.
    ICDM 2008.
  38.                                   Other Publications (Non-core)

  39. Is Deep Learning a Game Changer for Marketing Analytics? [Survey Paper for Practitioners]
    Glen Urban, Artem Timoshenko, Paramveer Dhillon, and John Hauser.
    MIT Sloan Management Review, 2020.
  40. Mapping of pain circuitry in early post-natal development using manganese-enhanced MRI in rats.
    Megan Sperry, Ben Kandel, Suzanne Wehrli, KN Bass, Sandhitsu Das, Paramveer Dhillon, James Gee, and Gordon Barr.
    Neuroscience, 2017.
  41.                                   Workshop Papers (Venues for getting initial feedback on research. Often do not have proceedings.)

    Acronyms for workshops include:

    NBER-SI: National Bureau of Economic Research - Summer Institute; CODE: MIT Conference on Digital Experimentation; WISE: Workshop on Information Systems and Economics; WIN: Workshop on Information in Networks; NSF-ITN: NSF Conference on Information Transmission in Networks at Harvard University; WCBA: Utah Winter Conference on Business Analytics; SICS: Summer Institute in Competitive Strategy; QME: Quantitative Marketing and Economics; CHITA: Conference on Health IT and Analytics 2024; PRNI: International Workshop on Pattern Recognition in Neuroimaging; SSDBM: Scientific and Statistical Database Management Conference; NESCAI: North East Student Colloquium on Artificial Intelligence; ViSU/CVPR: Visual Scene Understanding Workshop at CVPR; CISIS/LNCS: Computational Intelligence in Security for Information Systems Conference/Lecture Notes in Computer Science; GLB: Workshop on Graph Learning Benchmarks; CSEDM: Educational Data Mining in CS; IC2S2: International Conference on Computational Social Science; ICSSI: International Conference on Science of Science and Innovation; SCECR: International Conference on Statistical Challenges in E-commerce Research.

  42. A Proposed Framework to Identify Upstream Sources of Bias Among Racial Subgroups in Artificial Intelligence Models
    Rahul Ladhania, Allister Ho, Karandeep Singh, Paramveer Dhillon, Chad Brummett, and Anne Fernandez.
    CHITA 2024.
    [No Archived Proceedings]
  43. Do Digital Paywalls Impact Topical Content Coverage?
    Paramveer Dhillon, Anmol Panda, and Libby Hemphill.
    WISE 2023.
    CODE 2022.
    [No Archived Proceedings]
  44. Natural Disasters and Framing of Climate Change Events in Social Media: An Empirical Investigation on YouTube.
    Paramveer Dhillon, Siqi Ma, Jiyu Chen, and Anjana Susarla.
    SCECR 2023.
    [No Archived Proceedings]
  45. Demographic Disparities in Wikipedia Coverage: A Global Perspective.
    Yulin Yu, Tianyi Li, Xianglong Li, Paramveer Dhillon, and Daniel Romero.
    IC2S2 2022.
    [No Archived Proceedings]
  46. Unique in what sense? Heterogeneous relationships between multiple types of uniqueness and popularity in music.
    Yulin Yu, Pui Yin Cheung, Yong-Yeol Ahn, and Paramveer Dhillon.
    ICSSI 2023.
    IC2S2 2021. [Best poster award]
    [No Archived Proceedings]
  47. Comparing Ebook Student Interactions With Test Scores: A Case Study Using CSAwesome.
    Hisamitsu Maeda, Barbara Ericson, and Paramveer Dhillon.
    CSEDM 2021.
    [No Archived Proceedings]
  48. A New Benchmark of Graph Learning for PM2.5 Forecasting under Distribution Shift.
    Yachuan Liu, Jiaqi Ma, Paramveer Dhillon, and Qiaozhu Mei.
    GLB Workshop @ The Web Conference 2021.
    [No Archived Proceedings]
  49. Targeting for long-term outcomes.
    Jeremy Yang, Dean Eckles, Paramveer Dhillon, and Sinan Aral.
    SICS 2022.
    WISE 2020. [Nominated for Best student paper award]
    INFORMS Annual Conference 2020. [Best paper award]
    QME Conference 2020.
    [Abstract + Talk Only]
  50. Optimizing Targeting Policies via Sequential Experimentation for User Retention.
    Jeremy Yang, Dean Eckles, Paramveer Dhillon, and Sinan Aral.
    NeurIPS Workshop on "Do the right thing": Machine learning and causal inference for improved decision making 2019.
    CODE 2019.
    [Abstract + Talk Only]
  51. Digital Paywall Design: Implications for Content Demand and Subscriptions.
    Sinan Aral, Paramveer Dhillon.
    NBER-SI (Economics of Digitization) 2017.
    CODE 2016.
    WISE 2016. [Runner-up best paper award]
    WCBA 2016.
    [Abstract + Talk Only]
  52. Influence Maximization Revisited.
    Sinan Aral, Paramveer Dhillon.
    WIN 2015.
    NSF-ITN 2015.
    [Abstract + Talk Only]
  53. Anatomically-Constrained PCA for Image Parcellation.
    Paramveer Dhillon, James Gee, Lyle Ungar, and Brian Avants.
    PRNI 2013.
  54. Learning to Explore Scientific Workflow Repositories.
    Julia Stoyanovich, Paramveer Dhillon, Brian Lyons, and Susan Davidson.
    SSDBM 2013.
  55. Inference Driven Metric Learning for Graph Construction.
    Paramveer Dhillon, Partha Pratim Talukdar, and Koby Crammer.
    NESCAI 2010.
  56. Combining Appearance and Motion for Human Action Classification in Videos.
    Paramveer Dhillon, Sebastian Nowozin, and Christoph Lampert.
    ViSU/CVPR 2009.
  57. Robust Real-Time Face Tracking Using an Active Camera.
    Paramveer Dhillon
    CISIS/LNCS 2009.


  1. Code and data for our Nature Human Behaviour 2018 paper is available here.
  2. The ANTsR toolkit for medical image analysis (including the implementation of our NeuroImage 2014 paper) is available here.
  3. The SWELL (Spectral Word Embedding Learning for Language) JAVA toolkit for inducing word embeddings (cf. JMLR 2015, ICML 2012, NeurIPS 2011) is available here.
  4. Various Eigenword (SWELL) embeddings for reproducing the results in our JMLR 2015 paper can be found below [No additional scaling required for embeddings. Use them as is]. [Based on our results, OSCCA and TSCCA embeddings are the most robust and work best on a variety of tasks.]
  5. Generic eigenwords embeddings for various languages [Trained on much larger corpora.]

Last Modified: 3.31.24