Research
My research sits at the intersection of statistical methodology and educational measurement, developing robust psychometric frameworks for high-stakes and increasingly AI-assisted assessments. I specialize in Bayesian longitudinal modeling and the principled integration of machine learning into measurement.
AI-Integrated Measurement & Hybrid Scoring
I investigate how machine learning (ML) scoring can scale constructed-response assessments when integrated with human judgment.
- My recent work shows that systematic rater bias is a greater threat to reliability than random machine noise.
- I have identified a “bias compensation” mechanism: pairing ML models with human raters of opposing severity profiles can cancel out directional errors and restore accuracy.
- This research supports using the Many-Facet Rasch Model (MFRM) and anchoring strategies to stabilize scales in sparse scoring designs.
Bayesian Longitudinal & Growth Mixture Modeling
A core focus is the stability and evaluation of latent variable models.
- I develop Bayesian diagnostics to identify local identifiability issues and convergence failures in growth mixture models (GMMs).
- My work uses marginal likelihood to compare model selection strategies for longitudinal behavioral change.
Cross-Cultural Measurement (LEVANTE)
At Stanford, I work on the LEVANTE Project to create internationalized measures of learning for children ages 5–12.
- I focus on measurement invariance and statistical methods for modeling developmental change across cultural contexts.
- We aim to validate core cognitive tasks that remain robust across languages and research sites.
Adaptive Testing
- CAT fairness and robustness — mitigating item calibration error and designing robust stopping rules for computer-adaptive testing.