Research
My research operates at the intersection of statistical methodology and educational technology, specifically developing robust psychometric frameworks for high-stakes digital assessments. I specialize in Bayesian longitudinal modeling and the integration of machine learning into measurement.
Current Research Themes
1. AI-Integrated Measurement & Hybrid Scoring
I investigate how machine learning (ML) scoring can scale constructed-response assessments when integrated with human judgment.
- My recent work shows that systematic rater bias is a greater threat to reliability than random machine noise.
- I have identified a “bias compensation” mechanism: pairing ML models with human raters of opposing severity profiles can cancel out directional errors and restore accuracy.
- This research supports using the Many-Facet Rasch Model (MFRM) and anchoring strategies to stabilize scales in sparse scoring designs.
2. Bayesian Longitudinal & Growth Mixture Modeling (GMM)
A core focus is the stability and evaluation of latent variable models.
- I develop Bayesian diagnostics to identify local identifiability issues and convergence failures in growth mixture models.
- My work utilizes marginal likelihood to compare model selection strategies for longitudinal behavioral change.
3. Global Development (LEVANTE Project)
At Stanford, I work on the LEVANTE project to create internationalized measures of learning for children aged 5–12.
- I focus on measurement invariance and statistical methods for modeling developmental change across cultural contexts.
- We aim to validate core cognitive tasks that remain robust across languages and research sites.
4. Adaptive Measurement & Digital Innovation
- CAT Fairness: Researching calibration error mitigation and robust stopping rules in computer-adaptive testing.
Methods & Technical Toolbelt
I lean heavily on open-science and “GitHub-first” workflows to solve measurement challenges.
- Models: Multidimensional IRT (mIRT), Many-Facet Rasch Models (MFRM), Bayesian GMM, and Differential Item Functioning (DIF).
- AI/NLP: LLM evaluation (GPT-4), NLP-based scoring frameworks, and automated prompt engineering.
- Statistical Software: Expert proficiency in R (Stan, ggplot2, Shiny), Python, SQL, and Mplus.