Jihoon Kwon

Extraordinary achievements are born from the accumulation of ordinary efforts.

I am an undergraduate from Seoul National University, beginning my journey toward understanding the faithfulness of AI models—why models fail at tasks that are simple for humans, and how to make them remain grounded in the evidence they receive. My primary focus is on multi-modal learning, particularly vision-language models, where I study compositional reasoning failures and develop methods to mitigate hallucination.

I am also starting to explore in-context learning, beginning with language models and working toward extending these ideas to multi-modal contexts. At LinqAlpha, I build vision-language agents that automate investor workflows through accurate and faithful analysis of public filings.

Research Interest

My research interests stem from a fundamental observation: despite the remarkable progress in foundation models, these models still struggle with tasks that humans perform in a fundamental and intuitive manner - revealing critical gaps in how machines understand and reason about the world. I firmly believe that developing models capable of understanding and reasoning in a human-like way is essential to advancing human freedom, equality, and social solidarity.

My primary focus is on multi-modal models, specifically because I've discovered they struggle with fundamental tasks that should be trivial. For example, I researched compositional reasoning in CLIP-like models and found that alignment between vision and language modalities barely captures the relationships between elements. Recently, I've been investigating hallucination—how to prevent models from generating descriptions of content that simply isn't in the image.

I'm also exploring in-context learning (ICL) and instruction following, which are intuitive for humans and increasingly effective in models—seeking to understand and exploiting these fundamental adaptive capabilities. On ICL, I seek to understand what makes in-context learning effective at the mechanistic level, working on methods to measure and identify which internal representation changes lead to performance improvements. On instruction following, I have been working on projects that leverage this capability to create real-world value—including computer use automation and investment research applications.

Publications

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions

Jihoon Kwon, Kyle Min, Jy-yong Sohn

The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025 Poster)

Paper Code

Thematic Scoring: Quantifying Contextual Narratives using Language Models

Alejandro Lopez-Lira, Chanyeol Choi, Yoon Kim, Jihoon Kwon, Jin Kim, Suyeol Yun

SSRN 5233994

Paper

Linq-Embed-Mistral Technical Report

Chanyeol Choi, Junseong Kim, Seolhwa Lee, Jihoon Kwon, Sangmo Gu, Yejin Kim, Minkyung Cho, Jy-yong Sohn

ArXiv 2412.03223

Paper Model

Education

B.S. in Industrial Engineering

Seoul National University, Republic of Korea

2019/02 - 2025/08

Double Major: Business Administration
Relevant Coursework: Machine Learning, Optimization, Statistics

Work Experience

Fundamental Research Engineer - AI/LLM

LinqAlpha

2023/09 - Present

Developing a GUI-based Vision-Language-Action agent to automatically enter, record, and transcribe earnings, conference, and special calls.
Taking full ownership of research and critical experiments applying LLMs in finance.
Building and managing an end-to-end project to develop, collect and publish a finance-specific benchmark for training and evaluating LLMs.

Research Intern

ITML Lab, Yonsei University

2024/07 - Present

Proposed READ-CLIP, a fine-tuning method that enhances compositional reasoning in vision-language models via auxiliary losses (NeurIPS 2025).
Researching a PCA-based steering method to mitigate hallucination in large vision-language models by guiding attention toward salient image features based on instructions.
Researching principled evaluation metrics for in-context learning task vectors, and developing methods to flexibly handle variance induced by test inputs.

Projects

🏆 SQA Alphathon 2025 Winner (Can LLMs Hit Moving Targets? Tracking Evolving Signals in Corporate Disclosures)

Oct 2025

Developed an end-to-end LLM system that predicts stock returns by detecting strategic metric shifts in earnings calls:

Competition Winner: Won SQA Alphathon 2025 with 3.6× better stock return forecasting performance compared to the original method, demonstrating superior predictive power in real market applications
Novel LLM Framework: Designed context-aware extraction method and semantic scoring system to track how companies shift emphasized metrics over time—overcoming traditional keyword-based approaches that lose contextual information
Project Leadership: Led entire project from problem formulation to final validation, designing novel extraction methodology, architecting the system, and orchestrating all experimental workflows

Competition LinkedIn Short Paper

🏆 World Best Vector-based Retrieval Model (Linq-Embed-Mistral)

May 2024

Built data pipeline and training infrastructure for Linq-Embed-Mistral, a state-of-the-art embedding model:

Best Embedding Model: Achieved #1 ranking on Hugging Face MTEB leaderboard among 200+ models, surpassing OpenAI, Google, Cohere, and Nvidia with state-of-the-art retrieval performance
Data-Centric Approach: Designed hard negative mining strategies and dataset curation methods that enabled state-of-the-art retrieval accuracy while maintaining strong generalization without benchmark overfitting
Hands-On Implementation: Built complete data pipeline infrastructure, implemented hard negative mining algorithms, and executed systematic training and evaluation experiments to achieve optimal model performance

Leaderboard LinkedIn Blog