SIGKDD 2024 Tutorial - Causal Inference with Latent Variable: Recent Advances and Future Prospectives

Abstract

Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from the inherent difficulty in measuring the variables. Additionally, in observational studies where variables are passively recorded, certain covariates might be inadvertently omitted by the experimenter. Depending on the type of unobserved variables and the specific CI task, various consequences can be incurred if these latent variables are carelessly handled, such as biased estimation of causal effects, incomplete understanding of causal mechanisms, lack of individual-level causal consideration, etc. In this survey, we provide a comprehensive review of recent developments in CI with latent variables. We start by discussing traditional CI techniques when variables of interest are assumed to be fully observed. Afterward, under the taxonomy of circumvention and inference-based methods, we provide an in-depth discussion of various CI strategies to handle latent variables, covering the tasks of causal effect estimation, mediation analysis, counterfactual reasoning, and causal discovery. Furthermore, we generalize the discussion to graph data where interference among units may exist. Finally, we offer fresh aspects for further advancement of CI with latent variables, especially new opportunities in the era of large language models (LLMs).

Part 1:Background and Causal Inference Basics (45 mins)

Overview of causal machine learning.
Overview of latent variables in causal inference.
Rubin’s causal model (RCM) v.s. Pearl's structural causal model (SCM).
Four main causal inference tasks.

Part 2: Latent Confounding Analysis (30 mins)

Overview of traditional methods.
Circumvention-based methods.
Inference-based methods.

Part 3: Latent Causal Mediation Analysis (15 mins)

Overview of traditional methods.
Latent confounding analysis for CMA.
CMA with latent mediators.

Part 4: Latent Counterfactual Analysis (15 mins)

Overview of traditional methods.
Circumvention-based methods.
Inference-based methods.

Part 5: Generalization to Graphs (20 mins)

Overview of traditional methods.
Circumvention-based methods.
Inference-based methods.

Part 6: Summarization and Future Directions (25 mins)

Summary of current challenges and future directions.
Opportinity in the LLM era.

Presenters

Yaochen Zhu

Yaochen Zhu is a rising third-year Ph.D. student at the University of Virginia. Previously, He was a machine learning research intern at Netflix and and an applied research intern at LinkedIn. His research interests mainly lie in causal inference, large language models, as well as grounding them in specific data mining tasks.

Yinhan He

Yinhan He is a Ph.D. student in the Department of Electrical and Computer Engineering at the University of Virginia. He received the B.S. degree in Mathematics and Applied Mathematics from University of Chinese Academy of Sciences in 2022. His research interest is graph machine learning, explainable AI and machine learning for healthcare.

Jing Ma

Jing Ma is a tenure-track Assistant Professor in the Department of Computer & Data Sciences at Case Western Reserve University (CWRU). Before that, she obtained her Ph.D. in the Department of Computer Science at University of Virginia (UVA) in 2023 summer, working with Prof. Jundong Li and Prof. Aidong Zhang. She obtained her master's degree and bachelor's degree at Shanghai Jiao Tong University (SJTU). She is broadly interested in machine learning and data mining. Her current research mainly focuses on trustworthy AI (generalization, explanation, fairness, robustness, etc.), causal machine learning, graph mining, AI for social good, and recently large language model.

Mengxuan Hu

Mengxuan Hu is a rising third-year Ph.D. student at the University of Virginia.

Sheng Li

Sheng Li is a Quantitative Foundation Associate Professor of Data Science and an Associate Professor of Computer Science (by courtesy) at the University of Virginia (UVA). He was an Assistant Professor of Data Science at UVA from 2022 to 2023, an Assistant Professor of Computer Science at the University of Georgia from 2018 to 2022, and a Data Scientist at Adobe Research from 2017 to 2018. He received his PhD degree in Computer Engineering from Northeastern University in 2017 under the supervision of Prof. Yun Raymond Fu. He received his Master degree and Bachelor degree from School of Computer Science at Nanjing University of Posts and Telecommunications in 2012 and 2010, respectively. His recent research interests include trustworthy representation learning, graph neural networks, visual intelligence, and causal inference. He has published over 150 papers, and has received over 10 research awards, such as the INNS Aharon Katzir Young Investigator Award, Fred C. Davidson Early Career Scholar Award, Adobe Data Science Research Award, Cisco Faculty Research Award, and SDM Best Paper Award. He has served as Associate Editor for eight journals such as IEEE Trans. Neural Networks and Learning Systems (TNNLS) and IEEE Trans. Circuits and Systems for Video Technology (TCSVT), and has served as an Area Chair for IJCAI, NeurIPS, ICML, and ICLR.

Jundong Li

Jundong Li is an Assistant Professor in the Department of Electrical and Computer Engineering, with a joint appointment in the Department of Computer Science, and the School of Data Science. He received Ph.D. degree in Computer Science at Arizona State University in 2019. His research interests are in data mining, machine learning, and causal inference. He has published over 100 articles in high-impact venues and won prestigious awards including NSF CAREER Award, JP Morgan Chase Faculty Research Award, Cisco Faculty Research Award, and being selected for the AAAI 2021 New Faculty Highlights program.

Causal Inference with Latent Variable: Recent Advances and Future Prospectives