Donguk Kwon
Logo DLI Lab, Yonsei University

I'm an Integrated M.S./Ph.D. student at the Data & Language Intelligence Lab, Yonsei University, advised by Prof. Dongha Lee.

My primary research interests lie in structured data reasoning for language models (e.g., tabular and HTML data) and personalized aesthetic assessment (PAA), with an emphasis on fashion-related applications.


Education
  • Yonsei University
    Yonsei University
    Department of Artificial Intelligence
    M.S./Ph.D. Student
    Mar. 2025 - present
  • Yonsei University
    Yonsei University
    B.S. in Computer Science and Engineering
    Mar. 2020 - Feb. 2025
Experience
  • Sinchon University Alliance IT Startup Club, CEOS
    Sinchon University Alliance IT Startup Club, CEOS
    Web Front-end Part Leader
    Aug. 2025 - Jan. 2026
  • College of Engineering Student Council
    College of Engineering Student Council
    Vice President
    Dec. 2022 - May. 2023
  • Department of Computer Science and Engineering Student Council
    Department of Computer Science and Engineering Student Council
    President
    Dec. 2021 - May. 2023
  • College of Life Science and Biotechnology Dance Club, SHADOWS
    College of Life Science and Biotechnology Dance Club, SHADOWS
    President
    Jan. 2021 - Jun. 2022
Honors & Awards
  • Honors Award
    1st semester, 2021
  • Honors Award
    2nd semester, 2020
  • Highest Honors Award
    1st semester, 2020
Selected Publications (view all )
Region4Web: Rethinking Observation Space Granularity for Web Agents
Region4Web: Rethinking Observation Space Granularity for Web Agents

Donguk Kwon, Dongha Lee# (# corresponding author)

Preprint (arXiv) 2026

Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-level signals at every step. We argue observation should instead operate at the granularity of functional regions, parts of the page that each serve a distinct purpose. We propose Region4Web, a framework that reorganizes the AXTree into functional regions through hierarchical decomposition and semantic abstraction, exposing the page's functional organization as the basis for page state understanding. Moreover, we propose PageDigest, a web-specific inference pipeline that delivers this region-level observation to the actor agent as a compact per-page digest that persists across steps. On the WebArena benchmark, PageDigest substantially reduces observation length while improving overall task success rate across diverse backbone large language models (LLMs) and established agent methods, regardless of backbone capacity. These results show that operating at the granularity of functional regions delivers a more compact and informative basis for the actor agent than element-level processing alone.

Region4Web: Rethinking Observation Space Granularity for Web Agents

Donguk Kwon, Dongha Lee# (# corresponding author)

Preprint (arXiv) 2026

Web agents perceive web pages through an observation space, yet its granularity has remained an underexamined design choice. Existing work treats observation at the same element-level granularity as the action space, leaving the page's functional organization implicit and forcing the agent to infer it from element-level signals at every step. We argue observation should instead operate at the granularity of functional regions, parts of the page that each serve a distinct purpose. We propose Region4Web, a framework that reorganizes the AXTree into functional regions through hierarchical decomposition and semantic abstraction, exposing the page's functional organization as the basis for page state understanding. Moreover, we propose PageDigest, a web-specific inference pipeline that delivers this region-level observation to the actor agent as a compact per-page digest that persists across steps. On the WebArena benchmark, PageDigest substantially reduces observation length while improving overall task success rate across diverse backbone large language models (LLMs) and established agent methods, regardless of backbone capacity. These results show that operating at the granularity of functional regions delivers a more compact and informative basis for the actor agent than element-level processing alone.

All publications