Model Version History

v1

This is the model described in the original publication.

The model for two-group analysis is described by the following sampling statements:

\[\begin{split}\mu_1 &\sim \text{Normal}(\hat\mu, 1000\, \hat\sigma) \\ \mu_2 &\sim \text{Normal}(\hat\mu, 1000\, \hat\sigma) \\ \sigma_1 &\sim \text{Uniform}(\hat\sigma \,/\, 1000, 1000\, \hat\sigma) \\ \sigma_2 &\sim \text{Uniform}(\hat\sigma \,/\, 1000, 1000\, \hat\sigma) \\ \nu &\sim \text{Exponential}(1\,/\,29) + 1 \\ y_1 &\sim t_\nu(\mu_1, \sigma_1) \\ y_2 &\sim t_\nu(\mu_2, \sigma_2)\end{split}\]

Where \(\hat\mu\) and \(\hat\sigma\) are the sample mean and sample standard deviation of all the data from the two groups. The effect size is calculated as \((\mu_1 - \mu_2) \big/ \sqrt{(\sigma_1^2 + \sigma_2^2) \,/\, 2}\).

v2

Version 2 of the model fixes issues about the standard deviation and normality.

The standard deviation of a t distribution \(t_\nu(\mu, \sigma)\) is not \(\sigma\), but \(\sigma \sqrt{\nu / (\nu - 2)}\) if \(2 < \nu\), and infinite if \(1 < \nu \le 2\). Distributions with infinite standard deviation (SD) rarely occur in reality (and never when it comes to humans), so the lower bound of \(\nu\) is changed from 1 to 2.5. The plots now display SD instead of \(\sigma\), and the formula for effect size also uses \(\mathrm{sd}_i\) instead of \(\sigma_i\).

Why is the lower bound of \(\nu\) 2.5 and not 2?

The probability density function of \(t_2\) is quite close to that of \(t_{2.5}\) in the \(\mu \pm 5 \sigma\) region, but for \(\nu\) close to 2, the SD is arbitrarily large because of the strong outliers. Setting a bound of 2.5 prevents strong outliers and extremely large standard deviations.

Another change concerns the sampling of \(\sigma_i\). In the original model \(\sigma_i \,/\, \hat\sigma\) was uniformly distributed between \(1 \, / \,1000\) and \(1000\), meaning the prior probability of \(\sigma > \hat\sigma\) was 1000 times that of \(\sigma < \hat\sigma\), which caused an overestimation of \(\sigma\) with low sample sizes (around \(N = 5\)). To make these probabilities equal, now \(\log(\sigma_i \,/\, \hat\sigma)\) is distributed uniformly between \(\log(1\, / \,1000)\) and \(\log(1000)\). At \(N=25\) this change in the prior does not cause a perceptible change in the posterior.

Summary of changes:
  • Lower bound of \(\nu\) is 2.5.
  • SD is calculated as \(\sigma \sqrt{ \nu / (\nu - 2)}\).
  • Effect size is calculated as \((\mu_1 - \mu_2) \big/ \sqrt{(\mathrm{sd}_1^2 + \mathrm{sd}_2^2) \,/\, 2}\).
  • \(\log(\sigma_i \,/\, \hat\sigma)\) is uniformly distributed.

The model for two-group analysis is described by the following sampling statements:

\[\begin{split}\mu_1 &\sim \text{Normal}(\hat\mu, 1000 \, \hat\sigma) \\ \mu_2 &\sim \text{Normal}(\hat\mu, 1000 \, \hat\sigma) \\ \log(\sigma_1 \,/\, \hat\sigma) &\sim \text{Uniform}(\log(1 \, / \, 1000), \log(1000)) \\ \log(\sigma_2 \,/\, \hat\sigma) &\sim \text{Uniform}(\log(1 \, / \, 1000), \log(1000)) \\ \nu &\sim \text{Exponential}(1\, / \, 27.5) + 2.5 \\ y_1 &\sim t_\nu(\mu_1, \sigma_1) \\ y_2 &\sim t_\nu(\mu_2, \sigma_2)\end{split}\]