Jihoon Kwon

Jihoon Kwon
Extraordinary achievements are born from the accumulation of ordinary efforts.

I am an undergraduate from Seoul National University, beginning my journey toward understanding the faithfulness of AI models—why models fail at tasks that are simple for humans, and how to make them remain grounded in the evidence they receive. My primary focus is on multi-modal learning, particularly vision-language models, where I study compositional reasoning failures and develop methods to mitigate hallucination.

I am also starting to explore in-context learning, beginning with language models and working toward extending these ideas to multi-modal contexts. At LinqAlpha, I build vision-language agents that automate investor workflows through accurate and faithful analysis of public filings.

Research Interest

My research interests stem from a fundamental observation: despite the remarkable progress in foundation models, these models still struggle with tasks that humans perform in a fundamental and intuitive manner - revealing critical gaps in how machines understand and reason about the world. I firmly believe that developing models capable of understanding and reasoning in a human-like way is essential to advancing human freedom, equality, and social solidarity.

My primary focus is on multi-modal models, specifically because I've discovered they struggle with fundamental tasks that should be trivial. For example, I researched compositional reasoning in CLIP-like models and found that alignment between vision and language modalities barely captures the relationships between elements. Recently, I've been investigating hallucination—how to prevent models from generating descriptions of content that simply isn't in the image.

I'm also exploring in-context learning (ICL) and instruction following, which are intuitive for humans and increasingly effective in models—seeking to understand and exploiting these fundamental adaptive capabilities. On ICL, I seek to understand what makes in-context learning effective at the mechanistic level, working on methods to measure and identify which internal representation changes lead to performance improvements. On instruction following, I have been working on projects that leverage this capability to create real-world value—including computer use automation and investment research applications.

Publications

Enhancing Compositional Reasoning in CLIP via Reconstruction and Alignment of Text Descriptions
Jihoon Kwon, Kyle Min, Jy-yong Sohn
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025 Poster)
Thematic Scoring: Quantifying Contextual Narratives using Language Models
Alejandro Lopez-Lira, Chanyeol Choi, Yoon Kim, Jihoon Kwon, Jin Kim, Suyeol Yun
SSRN 5233994
Linq-Embed-Mistral Technical Report
Chanyeol Choi, Junseong Kim, Seolhwa Lee, Jihoon Kwon, Sangmo Gu, Yejin Kim, Minkyung Cho, Jy-yong Sohn
ArXiv 2412.03223

Education

B.S. in Industrial Engineering
Seoul National University, Republic of Korea
2019/02 - 2025/08
Double Major: Business Administration
Relevant Coursework: Machine Learning, Optimization, Statistics

Work Experience

Fundamental Research Engineer - AI/LLM
LinqAlpha
2023/09 - Present
  • Developing a GUI-based Vision-Language-Action agent to automatically enter, record, and transcribe earnings, conference, and special calls.
  • Taking full ownership of research and critical experiments applying LLMs in finance.
  • Building and managing an end-to-end project to develop, collect and publish a finance-specific benchmark for training and evaluating LLMs.
Research Intern
ITML Lab, Yonsei University
2024/07 - Present
  • Proposed READ-CLIP, a fine-tuning method that enhances compositional reasoning in vision-language models via auxiliary losses (NeurIPS 2025).
  • Researching a PCA-based steering method to mitigate hallucination in large vision-language models by guiding attention toward salient image features based on instructions.
  • Researching principled evaluation metrics for in-context learning task vectors, and developing methods to flexibly handle variance induced by test inputs.

Projects

🏆 SQA Alphathon 2025 Winner (Can LLMs Hit Moving Targets? Tracking Evolving Signals in Corporate Disclosures)
Oct 2025
Developed an end-to-end LLM system that predicts stock returns by detecting strategic metric shifts in earnings calls:
  • Competition Winner: Won SQA Alphathon 2025 with 3.6× better stock return forecasting performance compared to the original method, demonstrating superior predictive power in real market applications
  • Novel LLM Framework: Designed context-aware extraction method and semantic scoring system to track how companies shift emphasized metrics over time—overcoming traditional keyword-based approaches that lose contextual information
  • Project Leadership: Led entire project from problem formulation to final validation, designing novel extraction methodology, architecting the system, and orchestrating all experimental workflows
🏆 World Best Vector-based Retrieval Model (Linq-Embed-Mistral)
May 2024
Built data pipeline and training infrastructure for Linq-Embed-Mistral, a state-of-the-art embedding model:
  • Best Embedding Model: Achieved #1 ranking on Hugging Face MTEB leaderboard among 200+ models, surpassing OpenAI, Google, Cohere, and Nvidia with state-of-the-art retrieval performance
  • Data-Centric Approach: Designed hard negative mining strategies and dataset curation methods that enabled state-of-the-art retrieval accuracy while maintaining strong generalization without benchmark overfitting
  • Hands-On Implementation: Built complete data pipeline infrastructure, implemented hard negative mining algorithms, and executed systematic training and evaluation experiments to achieve optimal model performance

Blog