About Me

Hi, I’m Myeongsoo Kim, an Applied Scientist at AWS AI Labs working on Kiro. I build coding agents and the evaluation infrastructure that makes them better: benchmarks to measure where they fail, trajectory analysis to diagnose why, and self-improvement loops that turn those findings into validated fixes.

I earned my PhD in Computer Science from Georgia Tech, advised by Prof. Alessandro Orso.

Recent News

  • [Upcoming Talk] “Harness Optimization Through Live Traffic Analysis” at the Harness Engineering meetup, AWS Builder Loft, San Francisco. [Event]
  • [EMNLP 2026 - Under Review] “Coherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right Code”
    Myeongsoo Kim, Dingmin Wang, Siwei Cui, Farima Farmahinifarahani, Terry Yue Zhuo, Shweta Garg, Baishakhi Ray, Rajdeep Mukherjee, Varun Kumar
    [arXiv]
  • [ACL 2026] “CodeStruct: Code Agents over Structured Action Spaces”
    Myeongsoo Kim, Joe Hsu, Dingmin Wang, Shweta Garg, Varun Kumar, Murali Krishna Ramanathan
    [arXiv]
  • [Blog Post] “Surgical Precision with AST” on the Kiro blog. [Read]
  • [NeurIPS 2025 D&B] “CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance”
    Myeongsoo Kim, Shweta Garg, Baishakhi Ray, Varun Kumar, Anoop Deoras
    [arXiv]
  • [ICSE 2025 Industry] 🏆 Distinguished Paper Award – “Aster: Natural and Multi-Language Unit Test Generation with LLMs”
    Rangeet Pan, Myeongsoo Kim, Rahul Krishna, Raju Pavuluri, Saurabh Sinha
    [IEEE CS] [arXiv]
  • [FSE 2025] “LlamaRestTest: Effective REST API Testing with Small Language Models”
    Myeongsoo Kim, Saurabh Sinha, Alessandro Orso
    [ACM DL]
  • [ICSE 2025 Research] “A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs”
    Myeongsoo Kim, Saurabh Sinha, Alessandro Orso
    [IEEE]
  • [ICSE 2025 Demo] “AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL”
    Tyler Stennett, Myeongsoo Kim, Saurabh Sinha, Alessandro Orso
    [IEEE]

Research Interests

  • Self-improving coding agents – closed loops that find failures, reproduce them, propose fixes, and keep only what passes a fair, reproducible gate
  • Evaluation and benchmarking – designing benchmarks that reflect real agent failures (multi-turn, long-horizon, live-traffic-grounded), not just isolated coding puzzles
  • Trajectory analysis and failure diagnosis – understanding why agents fail after reaching the right code (coherence collapse, action-space design, compaction loss)
  • Agent architecture – structured action spaces, multi-agent coordination, and code-level enforcement mechanisms that prompt-steering alone cannot achieve

Publications

You can find my research publications on my Google Scholar profile.

Get in Touch

Feel free to reach out via email or connect with me on LinkedIn and GitHub.