Jihwan Jeong

Ph.D. Candidate at University of Toronto

Biography

Welcome to my profile 😄! I’m a Ph.D. candidate at the University of Toronto, contributing actively to the D3M (Data-Driven Decision-making) lab under the mentorship of Professor Scott Sanner. My interest in AI and ML is rooted in their potential to revolutionize decision-making in diverse areas.

My research primarily focuses on leveraging models for enhanced decision-making, with a special emphasis on offline model-based reinforcement learning. This work includes a notable paper accepted at ICLR-23, which explores the use of Bayesian models for robust planning and policy learning by accounting for the epistemic uncertainty of models (learn more here).

My internship at Google Research, under the guidance of Yinlam Chow, was a transformative period. There, I contributed to integrating recommendation systems with large language models (LLMs), applying Reinforcement Learning with AI Feedback (RLAIF) in a novel way to the challenge of recommendation explanations. This culminated in a first-authored paper that highlights the effective fine-tuning of LLMs for accurate and personalized recommendations. Additionally, I was instrumental in developing the PAX pipeline, a cornerstone for our team’s language model projects. (Check out the other paper here!)

Approaching the completion of my Ph.D., my thesis, tentatively titled “Leveraging Learned Models for Decision-Making,” encapsulates my research ethos. It tackles the intricacies of using imperfect models for decision-making by focusing on (1) optimizing decision loss, (2) employing Bayesian methods for uncertainty management, and (4) enabling models and policies to adapt swiftly in new environments.

I look forward to opportunities that will allow me to apply and expand my expertise in AI/ML, aiming to continue making impactful contributions in this dynamic field.

Download my CV .

Interests

Offline & model-based reinforcement learning
Uncertainty quantification in neural networks
RL for Large Language Models
Decision-aware model learning

Education

Ph.D. Candidate in Information Engineering (Present)
University of Toronto
M.S. in Industrial and Systems Engineering, 2019
Korea Advanced Institute of Science and Technology (KAIST)
B.S. in Chemistry, 2015
Korea Advanced Institute of Science and Technology (KAIST)

Experience

Student Research Program

Google Research

Jun 2023 – Present Mountain View, CA, US (Remote)

Contributed to the submission of two papers, applying RLAIF to advance language models for personalized recommendations and contributing significantly to the development of the PAX pipeline.

Research Intern

Vector Institute

Jun 2022 – Sep 2022 Toronto

Worked with Professor Pascal Poupart on a model-based offline reinforcement learning project (under review).

Research Intern

LG AI Research

Jun 2021 – Oct 2021 Seoul

Worked on a model-based offline reinforcement learning project (ICLR-23).

Ph.D. Candidate (~present)

University of Toronto

Sep 2019 – Present Toronto

Research Projects

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization (to appear at ICLR-23)

A model-based offline RL algorithm that is able to trade-off the uncertainty of the learned dynamics model with that of the value function through Bayesian posterior estimation, achieving state-of-the-art performance on a variety of D4RL benchmark tasks.

Publications

Quickly discover relevant content by filtering publications.

Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner (2023). Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization. In ICLR-23.

PDF Cite Project

Jihwan Jeong, Parth Jaggi, Andrew Butler, Scott Sanner (2022). An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear Programming. In ICML-22.

PDF Cite Code Project Video

Noah Patton, Jihwan Jeong, Mike Gimelfarb, Scott Sanner (2022). A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs. In AAAI-22.

PDF Cite Project Video

Zheda Mai, Ruiwen Li, Jihwan Jeong, David Quispe, Hyunwoo Kim, Scott Sanner (2022). Online Continual Learning in Image Classification: An Empirical Survey. Neurocomputing, 469: 28-51, 2022.

PDF Cite Code Project

Jihwan Jeong, Parth Jaggi, Scott Sanner (2021). Symbolic Dynamic Programming for Continuous State MDPs with Linear Program Transitions. In IJCAI-21.

PDF Cite Project Video

Jihwan Jeong, Hayong Shin (2021). Bayesian Optimization for a Multiple-Component System with Target Values. Computers & Industrial Engineering, 157.

PDF Cite

Dongsub Shim, Zheda Mai, Jihwan Jeong, Scott Sanner, Hyunwoo Kim, Jongseong Jang (2021). Online Class-Incremental Continual Learning with Adversarial Shapley Value. In AAAI-21.

PDF Cite Project Video

Zheda Mai, Hyunwoo Kim, Jihwan Jeong, Scott Sanner (2020). Batch-level Experience Replay with Review for Continual Learning. In CVPR Workshop on Continual Learning in Computer Vision.

PDF Cite Code Slides

Teaching Experience

Introduction to Artificial Intelligence (MIE369)

Jan 2023 – Apr 2023 University of Toronto

Course Instructor

Optimization in Machine Learning (MIE424)

Jan 2023 – Apr 2023 University of Toronto

Teaching Assistant

Decision Support Systems (MIE451) 2022 Fall

Sep 2022 – Dec 2022 University of Toronto

Teaching Assistant

Introduction to Artificial Intelligence (MIE369)

Jan 2022 – Apr 2022 University of Toronto

Teaching Assistant

Introduction to Artificial Intelligence (MIE369)

Jan 2021 – Apr 2021 University of Toronto

Teaching Assistant

Introduction to Artificial Intelligence (MIE369)

May 2021 – Aug 2021 University of Toronto

Teaching Assistant

Optimization in Machine Learning (MIE424)

Jan 2020 – Apr 2020 University of Toronto

Teaching Assistant

Foundations of Data Analytics and Machine Learning (APS1070)

Sep 2019 – Dec 2019 University of Toronto

Teaching Assistant

Featured Publications

Jihwan Jeong, Xiaoyu Wang, Michael Gimelfarb, Hyunwoo Kim, Baher Abdulhai, Scott Sanner

January, 2023 In ICLR-23

Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization

A model-based offline RL algorithm that is able to trade-off the uncertainty of the learned dynamics model with that of the value function through Bayesian posterior estimation, achieving state-of-the-art performance on a variety of D4RL benchmark tasks.

Jihwan Jeong, Parth Jaggi, Andrew Butler, Scott Sanner

July, 2022 In ICML-22

An Exact Symbolic Reduction of Linear Smart Predict+Optimize to Mixed Integer Linear Programming

The Smart Predict+Optimize (SPO) framework tries to solve a decision-making problem expressed as mathematical optimization in which some coefficients have to be estimated by a predictive model. The challenge is that this problem is non-convex and non-differentiable, even for linear programs with linear predictive models. Despite that, we provide the first exact optimal solution to the SPO problem by formulating it as a bi-level bi-linear program and reducing it to a mixed-integer linear program (MILP) using a novel symbolic method.

Noah Patton, Jihwan Jeong, Mike Gimelfarb, Scott Sanner

June, 2022 In AAAI-22

A Distributional Framework for Risk-Sensitive End-to-End Planning in Continuous MDPs

End-to-end planning framework for risk-sensitive planning under stochastic environments by backpropagating through a model of the environment. The core idea is to use reparameterization of the state distribution, leading to a unique distributional perspective of end-to-end planning where the return distribution is utilized for sampling as well as optimizing risk-aware objectives by backpropagation in a unified framework.

Jihwan Jeong, Parth Jaggi, Scott Sanner

August, 2021 In IJCAI-21

Symbolic Dynamic Programming for Continuous State MDPs with Linear Program Transitions

Recent advances in symbolic dynamic programming (SDP) have significantly broadened the class of MDPs for which exact closed-form value functions can be derived. However, no existing solution methods can solve complex discrete and continuous state MDPs where a linear program determines state transitions — transitions that are often required in problems with underlying constrained flow dynamics arising in problems ranging from traffic signal control to telecommunications bandwidth planning. In this paper, we present a novel SDP solution method for MDPs with LP transitions and continuous piecewise linear dynamics by introducing a novel, fully symbolic argmax operator.