image of author brainstorming ideas

Hi, I am Jaewook

I’m a Computer Science Ph.D. candidate at UMass Amherst passionate about applied AI in various fields. Here, you’ll find my journey, projects, and publications.

📍 AWS Applied Scientist Intern @ Seattle (05/27/25 - 08/29/25)

💡 I am actively seeking a Summer 2026 AI/ML internship. I have a strong research background in NLP/AI applications, with publications in ACL* (EMNLP, NAACL), AI in Education (AIED, EDM), as well as an ACM TODAES journal article showcasing my systems expertise. I also bring full-stack development skills, ranging from native iOS apps to React-based web applications. My expected graduation date is May 2027 or earlier — please feel free to reach out.

Date Published

August 9, 2025

Author

Jaewook Lee

My journey so far

“Used to map matmuls to silicon. Now I wrap LLMs in uncertainty—and snap when the app shows a gap.”

It’s a playful rhyme for my journey: from studying electrical engineering in my bachelor’s to exploring large language models in education for my Ph.D.

Prototype of Vocabulary Builder

Back in 2018—pre-ChatGPT era—I participated in an iOS Hackathon and built a vocabulary-builder prototype that won the grand prize. My vision for future work was to automatically generate keyword mnemonics and pair them with images combining those keywords. At the time, automatically generating keyword mnemonics was impossible, so I instead pursued a master’s in computer engineering to build strong foundations in both hardware and software.

Process-in-Memory Emulator

During my master’s, I worked extensively with FPGA platforms. A notable project was developing a software interface for Processing-in-Memory (PIM) emulator, which became the basis for my first journal publication. My role involved creating an interface between the PIM hardware and ONNX Runtime, then designing a scheduling algorithm to optimize execution on a heterogeneous platform. Because the device had no internet access, the only way to transfer the framework was via an SD card—a process I repeated so many times that the card eventually wore out and had to be replaced.

From this experience, I discovered two things:

  1. I love writing papers—it feels like wrapping a hard-earned gift.
  2. I finally grasped the value of a top-down approach in research, after struggling early in my master’s by treating research like coursework.

I chose a Ph.D. in my current field because I wanted to make education more engaging. The vocabulary-builder project remains one of my happiest moments—coding all night felt like pure fun. When I started my Ph.D., the sudden boom of LLMs made possible what I had once only imagined: automatic mnemonic generation. This became the topic of my first publication.

Now, I work on applying machine learning, reinforcement learning, and evaluation methods for LLM-generated content, as well as developing web apps for conducting human evaluations.

Jaewook Lee

NLP Application • Computer Science

View CV
Fields:
Natural language processing
AI
Machine Learning
Updated on Aug 09, 2025

Education

  • University of Massachusetts Amherst
    • Ph.D. in Computer Science (2022.9 - present)
    • Advisor: Prof. Andrew Lan
    • Research Area:  NLP Application, Human-in-the-loop AI
    • Passed Qualification Exam with distinction (2025.05)
  • Korea University, Seoul, Republic of Korea
    • M.E. in Electrical and Computer Engineering (2019.9 - 2022.2)
    • Advisor: Prof. Seon Wook Kim
    • Research Area: Compiler, Processing-in-Memory, AI framework
  • Korea University, Seoul, Republic of Korea
    • B.E. in Electrical Engineering (2013.3 - 2019.8)
    • Graduated with honors

Work Experience

  • Amazon Web Service, Seattle, WA, United States
    • Applied Scientist  Intern
    • Developed a methodology for evaluating a multi-agent framework (details under NDA).
  • Eedi, London, United Kingdom (Remote)
    • Machine Learning Research Intern
    • Built the prototype for AnSearch, an AI-driven math diagnostic question generator that won the Tools Competition, combining LLM speed with educator expertise to create assessments targeting common misconceptions.

Research Experience

  • AI for Human Creativity and Learning — Keyword Mnemonics
    • Statistical modeling: Developed expectation–maximization models to learn latent user variables and generation rules for interpretable mnemonics
    • Phonological similarity: Mentored and advised research on algorithms that identify phonologically similar keywords in a learner’s L1 for L2 vocabulary acquisition
    • Evaluation: Designed and deployed evaluation pipelines combining psycholinguistic measures with human assessments to measure mnemonic memorability and creativity
    • Multi-modal creativity: Initiated early exploration of integrating LLM-generated verbal cues with visual elements, opening a new direction for mnemonic design
  • AI for Educational Assessment and Feedback — Math Education
    • Training LLM-based tutors: Developed a training approach for dialogue-based tutors that optimizes tutor responses for both student correctness and pedagogical quality, using candidate generation, scoring, and preference optimization
    • Automated distractor creation: Created pipelines using prompting, fine-tuning, and variational error modeling to produce plausible, targeted distractors
    • Human–AI collaboration: Designed interactive authoring workflows enabling educators to refine AI-generated stems and distractors
  • AI Systems and Platform Optimization (Industry–Academia collaboration)
    • PIM platforms: Developed ONNX Runtime integration for PIM on both x86 and ARM environments; designed profiling and scheduling algorithms to optimize DNN inference on heterogeneous PIM architectures (SK Hynix)
    • Compiler-based frameworks for NPUs: Built tools to extract memory traces from DNN accelerators and modified LLVM to generate code that maximizes scratchpad memory efficiency (Samsung)

Awards

  • NAEP Math Automated Scoring Challenge Grand Prize 🏆 (2023)
    • Organized by National Center for Education Statistics (NCES)
    • Challenge to develop an accurate, LLM-based scoring system for open-ended math responses
  • NeurIPS 2022 Causal Edu Competition (Task 3) - 3rd (2022)
    • Organized by EEDI
    • Challenge to identify causal relationship in real-world educational time-series data
  • iOS Application Hackathon Grand Prize 🏆 (2018)
    • Organized by Software Technology and Enterprise, Korea University
    • Developed a vocabulary builder app designed to help users efficiently memorize new words

Key Skills & Strengths

  • Deep Learning
  • Natural Language Processing
  • Scientific Writing
  • Data Analysis
  • Collaboration
  • Research Communication
  • Python
  • Experiment Design
  • Critical Thinking

Ph.D. candidate in computer science

Research Projects

woman analyzing business insights with data analytics software
2025 ARR May

Interpretable AI with User Latents

Develooped an interpretable, rule-based generative framework for mnemonic creation — e.g., “A person (人) resting (休) by a tree (木)” — using a novel EM-type algorithm to learn compositional patterns from learner-authored data, enabling effective, transparent LLM-assisted Japanese vocabulary learning even in cold-start scenarios.

2025 ARR May

Phonology-Aware Cross-Lingual Mnemonics

Developed PhoniTale, a cross-lingual mnemonic generation system that retrieves L1 keyword sequences via phonological similarity and leverages LLMs to produce effective L2 vocabulary mnemonics for learners of typologically distant languages

background image of office interior (for a financial advisor)
AIED 2025

Pedagogy-Aware AI with Student Modeling

Developed a DPO-trained tutoring LLM that optimizes utterances for both pedagogical quality and likelihood of correct student responses by combining LLM-based student modeling with rubric-guided scoring

image of collaborative meeting
AIED 2025

LLM-Driven SVGs for Math Learning

Explored LLM-driven generation of scalable SVG math diagrams for educational hints, defining the task, experimenting with prompting strategies, and evaluating feasibility through Visual Question Answering and ablation studies.

image of studio atmosphere (for a game development company)
EMNLP 2024

Psycholinguistic-Based AI Evaluation

Developed an overgenerate-and-rank framework using LLMs to produce and score verbal cues for vocabulary learning based on psycholinguistic measures and user-study insights, achieving human-comparable quality in imageability, coherence, and usefulness while revealing learner preference diversity.

[digital project] image of film festival awards
EDM 2024

LLMs for Feedback in Open-Ended Math

Investigated LLM-based automated feedback generation for open-ended math questions, fine-tuning open-source and proprietary models on ITS feedback data.

[background image] image of a work desk with a laptop and documents (for a ai legal tech company)
EDM 2024

Collaborative AI for Educational Content

Developed a prototype human–AI collaborative tool for generating high-quality math MCQs, leveraging LLMs to produce accurate question stems while enabling educators to refine distractors based on common student misconceptions, streamlining assessment creation.

image of content management strategy session
NAACL 2024

Automating Math MCQ Distractors

Explored LLM-based approaches from in-context learning to fine-tuning for automated distractor generation in math MCQs, evaluating performance on real-world data.

image of content management strategy session
AIED 2023

Automated Cue Generation with LLMs

Developed an end-to-end LLM-powered pipeline that automatically generates verbal and visual cues for keyword-based vocabulary learning, eliminating the manual bottleneck and producing highly memorable content that matches human-created cues in effectiveness.

image of content management strategy session
ACM TODAES

Optimal Model Partitioning for PIM Platforms

Designed low-overhead profiling and dynamic programming–based model partitioning algorithms for PIM-based deep learning inference, reconstructing computational graphs to capture all scheduling paths and minimizing execution time with only four profiling runs, outperforming manual and greedy baselines on BERT, RoBERTa, and GPT-2.