I am a first year Ph.D. student, working with Prof. Stella X. Yu at CSE, University of Michigan. Before joining Stella's group, my research focused on ambient sound-conditioned visual synthesis at Korea University. I worked as a Google Student Researcher, exploring ways to improve image generation and cropping with measurable quality signals. My research goal is to develop a physically understandable foundation model in an unsupervised manner, bridging human perception and machine understanding. Feel free to reach out to chat more about this.
Education: B.S. in Computer Science, Unviersity of Seoul; M.S. in Computer Vision, Korea University; Currently pursuing Ph.D. in Computer Science and Engineering, University of Michigan, 2024–Present.
Contact: seungle [at] umich [dot] edu | easter3163 [at] korea [dot] ac [dot] kr
If you would like to talk about your life, career plans, research ideas, etc. related to AI/ML, CS and math, please contact me by email at any time to schedule a meeting.
Our paper "Cropper: Vision-Language Model for Image Cropping through In-Context Learning" has been accepted to CVPR 2025. Join my presentation in June!
Presented our work "Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation" as an oral presentation (2.3% acceptance rate) at ECCV 2024.
Started my Ph.D. journey at the University of Michigan's CSE department, working with Prof. Stella X. Yu on foundation models for computer vision.
1. Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation (work done during an internship at Google Research)
ECCV 2024 Oral (2.3%)
This work introduces a novel multi-reward RL framework to optimize text-to-image generation, balancing different quality signals.
Authors: Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang
2. Cropper: Vision-Language Model for Image Cropping through In-Context Learning (work done during an internship at Google Research)
CVPR 2025
VLM with in-context learning enhances free-form, subject-aware, aspect-ratio aware cropping, without training.
Authors: Seung Hyun Lee*, Jijun Jiang*, Yiran Xu*, Zhuofang Li*, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang
3. Sound-Guided Semantic Image Manipulation (co-worked with NVIDIA)
CVPR 2022
Authors: Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chanyoung Kim, Jinkyu Kim*, Sangpil Kim*
4. Sound-Guided Semantic Video Generation (co-worked with NVIDIA)
ECCV 2022
Authors: Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Jihyun Bae, Jinkyu Kim*, Sangpil Kim*
5. Robust Sound-Guided Image Manipulation (co-worked with NVIDIA)
Neural Networks 2024
Authors: Seung Hyun Lee*, Hyung-gun Chi*, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Hyunje Park, Wonjun Cho, Jinkyu Kim*, Sangpil Kim*
6. The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion (co-worked with NVIDIA)
ICCV 2023
Authors: Yujin Jeong, Wonjeong Ryoo, Seung Hyun Lee, Dabin Seo, Wonmin Byeon, Jinkyu Kim
7. Audio-guided implicit neural representation for local image stylization (co-worked with NVIDIA)
Computational Visual Media 2024
Authors: Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim*, Sangpil Kim*
8. Soundini: Sound-Guided Diffusion for Natural Video Editing
Under review
Authors: Seung Hyun Lee, Sieun Kim, Innfarn Yoo, Feng Yang, Donghyeon Cho, Youngseo Kim, Huiwen Chang, Jinkyu Kim*, Sangpil Kim*
9. Functional Hand Type Prior for 3D Hand Pose Estimation and Action Recognition from Egocentric View Monocular Videos
BMVC 2023 Oral
Authors: Wonseok Roh, Seung Hyun Lee, Wonjeong Ryoo, Gyeongrok Oh, Jakyung Lee, Soo Yeon Hwang, Hyung-gun Chi, Sangpil Kim