My research operates at the intersection of statistical methodology and educational technology, specifically developing robust psychometric frameworks for high-stakes digital assessments. I specialize in Bayesian longitudinal modeling and the integration of machine learning into measurement.

Current Research Themes

1. AI-Integrated Measurement & Hybrid Scoring

I investigate how machine learning (ML) scoring can scale constructed-response assessments when integrated with human judgment.

  • My recent work shows that systematic rater bias is a greater threat to reliability than random machine noise.
  • I have identified a “bias compensation” mechanism: pairing ML models with human raters of opposing severity profiles can cancel out directional errors and restore accuracy.
  • This research supports using the Many-Facet Rasch Model (MFRM) and anchoring strategies to stabilize scales in sparse scoring designs.

2. Bayesian Longitudinal & Growth Mixture Modeling (GMM)

A core focus is the stability and evaluation of latent variable models.

  • I develop Bayesian diagnostics to identify local identifiability issues and convergence failures in growth mixture models.
  • My work utilizes marginal likelihood to compare model selection strategies for longitudinal behavioral change.

3. Global Development (LEVANTE Project)

At Stanford, I work on the LEVANTE project to create internationalized measures of learning for children aged 5–12.

  • I focus on measurement invariance and statistical methods for modeling developmental change across cultural contexts.
  • We aim to validate core cognitive tasks that remain robust across languages and research sites.

4. Adaptive Measurement & Digital Innovation

  • CAT Fairness: Researching calibration error mitigation and robust stopping rules in computer-adaptive testing.

Methods & Technical Toolbelt

I lean heavily on open-science and “GitHub-first” workflows to solve measurement challenges.

  • Models: Multidimensional IRT (mIRT), Many-Facet Rasch Models (MFRM), Bayesian GMM, and Differential Item Functioning (DIF).
  • AI/NLP: LLM evaluation (GPT-4), NLP-based scoring frameworks, and automated prompt engineering.
  • Statistical Software: Expert proficiency in R (Stan, ggplot2, Shiny), Python, SQL, and Mplus.