Research

My research sits at the intersection of statistical methodology and educational measurement, developing robust psychometric frameworks for high-stakes and increasingly AI-assisted assessments. I specialize in Bayesian longitudinal modeling and the principled integration of machine learning into measurement.

AI-Integrated Measurement & Hybrid Scoring

I investigate how machine learning (ML) scoring can scale constructed-response assessments when integrated with human judgment.

  • My recent work shows that systematic rater bias is a greater threat to reliability than random machine noise.
  • I have identified a “bias compensation” mechanism: pairing ML models with human raters of opposing severity profiles can cancel out directional errors and restore accuracy.
  • This research supports using the Many-Facet Rasch Model (MFRM) and anchoring strategies to stabilize scales in sparse scoring designs.

Bayesian Longitudinal & Growth Mixture Modeling

A core focus is the stability and evaluation of latent variable models.

  • I develop Bayesian diagnostics to identify local identifiability issues and convergence failures in growth mixture models (GMMs).
  • My work uses marginal likelihood to compare model selection strategies for longitudinal behavioral change.

Cross-Cultural Measurement (LEVANTE)

At Stanford, I work on the LEVANTE Project to create internationalized measures of learning for children ages 5–12.

  • I focus on measurement invariance and statistical methods for modeling developmental change across cultural contexts.
  • We aim to validate core cognitive tasks that remain robust across languages and research sites.

Adaptive Testing

  • CAT fairness and robustness — mitigating item calibration error and designing robust stopping rules for computer-adaptive testing.