Doria Xiao - Blog

New Publication: Reliability of Hybrid Human-ML Scoring Systems

Sun, 01 Feb 2026 08:00:00 GMT

I am pleased to share that our new paper, “Revisiting reliability with human and machine learning raters under scoring design and rater configuration in the many-facet Rasch model,” has been published in the British Journal of Mathematical and Statistical Psychology.

This research, conducted with co-authors Richard J. Patz and Mark R. Wilson, investigates the psychometric impact of integrating machine learning (ML) scoring into high-stakes assessment frameworks.

Key Insights

Systematic Bias vs. Noise: We found that systematic rater bias, rather than random machine inconsistency, is the primary driver of estimation error in hybrid scoring systems.
Design Density: Increasing scoring matrix density (moving from isolated to complete designs) significantly stabilizes latent proficiency recovery.
Strategic Hybridization: Hybrid scoring yields the greatest reliability gains when human and ML raters possess opposing biases, allowing directional errors to cancel out.
Robust Modeling: For sparse scoring designs, the Partial Credit Model (PCM) with fixed thresholds often outperforms more complex Many-Facet variants by reducing over-parameterization.

Practical Application

We applied these findings to real-world data from a “Problem Solving with Math” (PSM) assessment. Results confirmed that anchoring constructed-response items to selected-response metrics can effectively stabilize scales in sparse scoring environments.

Full Citation: Xiao, X., Patz, R. J., & Wilson, M. R. (2026). Revisiting reliability with human and machine learning raters under scoring design and rater configuration in the many-facet Rasch model. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.70034

New Publication: Trajectories of Depressive Symptoms During COVID-19

Mon, 15 Sep 2025 07:00:00 GMT

Publication Cover

I’m very excited to share that our article,
“Trajectories of Depressive Symptom Among College Students in China During the COVID-19 Pandemic: Association With Suicidal Ideation and Insomnia Symptoms,”
has just been published in Suicide and Life-Threatening Behavior :contentReferenceoaicite:0.

This three-year, five-wave longitudinal study followed 1,387 Chinese college students to better understand how depression evolved during the pandemic. Using growth mixture modeling (GMM), we identified five distinct trajectories of depressive symptoms:

Resilient (24%) — consistently low symptoms
Moderate-remission (46%) — moderate but improving over time
Low-increasing (20%) — gradual worsening
High-recovery (6%) — severe early, later recovered
Moderate-increasing (4%) — moderate but worsening

🔎 Importantly, these trajectories were not just statistical patterns — they predicted later risks of suicidal ideation and insomnia. Students in the increasing-symptom groups were at particularly high risk, underscoring the importance of early identification and targeted interventions.

This study highlights the heterogeneity of mental health responses during COVID-19 and calls for more nuanced, trajectory-based approaches to supporting young people.

📄 Full text available here: https://onlinelibrary.wiley.com/doi/10.1111/sltb.70051

BEAR Seminar Talk: When Growth Mixture Models Break

Wed, 10 Sep 2025 07:00:00 GMT

Yesterday I had the honor of presenting my dissertation,
“When Growth Mixture Models Break,” as a talk in the BEAR Seminar at the School of Education, University of California, Berkeley.

I am deeply grateful to the Social Research Methodologies (SRM) cluster—my mentors, colleagues, and friends—who have shaped and supported me throughout this journey.

The surprise celebration, complete with an SRM-themed cake 🎂, flowers 🌹, and cap 🎓, made the day even more special.

Proud to be an SRM Bear 🐻, and excited to continue this work as I begin the next chapter at Stanford!

[BEAR Seminar] When Growth Mixture Models Break

Tue, 09 Sep 2025 07:00:00 GMT

I’m excited to be giving a BEAR Seminar talk at UC Berkeley:

When: Tuesday, September 9, 2025, at 2:00 p.m.
Where: Berkeley Way West 4310 and via Zoom
Event page: Berkeley Events Calendar

Talk overview

Growth Mixture Models (GMMs) are widely used to capture unobserved heterogeneity in longitudinal data. But they are fragile: nonidentifiability can cause classes to collapse or merge, and common information criteria (like AIC or traditional DIC) often fail under skewed or multimodal likelihoods.

In this talk, I’ll discuss posterior pathologies such as minuscule-class behavior, twinlike-class degeneracy, and stuck chains, and show why plug-in deviance penalties sometimes become negative. I’ll introduce diagnostics (moving-SD checks, Distinguishability Index) and argue that variance-based penalties (DIC_pV), WAIC, and LOO-CV align more closely with the marginal likelihood and provide more reliable guidance for class enumeration.

If you’re nearby—or joining remotely—I’d love to see you there!

Welcome

Mon, 08 Sep 2025 07:00:00 GMT

Hi there, and thanks for visiting my site!

I’m Xingyao (Doria) Xiao, a postdoctoral scholar at Stanford. My research brings together psychometrics, Bayesian modeling, and AI to understand learning and improve assessment.

I built this space to share what I’m working on, from new papers and projects to talks and teaching. I’ll also post occasional reflections on methods and the joys (and headaches!) of modeling complex data.

Feel free to explore the pages above, and if anything catches your interest, I’d love to connect.