I am an undergraduate from Seoul National University, beginning my journey toward understanding the faithfulness of AI models—why models fail at tasks that are simple for humans, and how to make them remain grounded in the evidence they receive. My primary focus is on multi-modal learning, particularly vision-language models, where I study compositional reasoning failures and develop methods to mitigate hallucination.
I am also starting to explore in-context learning, beginning with language models and working toward extending these ideas to multi-modal contexts. At LinqAlpha, I build vision-language agents that automate investor workflows through accurate and faithful analysis of public filings.
My research interests stem from a fundamental observation: despite the remarkable progress in foundation models, these models still struggle with tasks that humans perform in a fundamental and intuitive manner - revealing critical gaps in how machines understand and reason about the world. I firmly believe that developing models capable of understanding and reasoning in a human-like way is essential to advancing human freedom, equality, and social solidarity.
My primary focus is on multi-modal models, specifically because I've discovered they struggle with fundamental tasks that should be trivial. For example, I researched compositional reasoning in CLIP-like models and found that alignment between vision and language modalities barely captures the relationships between elements. Recently, I've been investigating hallucination—how to prevent models from generating descriptions of content that simply isn't in the image.
I'm also exploring in-context learning (ICL) and instruction following, which are intuitive for humans and increasingly effective in models—seeking to understand and exploiting these fundamental adaptive capabilities. On ICL, I seek to understand what makes in-context learning effective at the mechanistic level, working on methods to measure and identify which internal representation changes lead to performance improvements. On instruction following, I have been working on projects that leverage this capability to create real-world value—including computer use automation and investment research applications.