Developer Interface

Functions for analysis

best.analyze_one(group_data, ref_val: float = 0, n_samples: int = 2000, **kwargs) → best.model.BestResultsOne[source]

Analyze the distribution of a single group

This method is typically used to compare some observations against a reference value, such as zero. It can be used to analyze paired data, or data from an experiment without a control group.

This analysis takes around a minute, depending on the amount of data.

This function creates a model with the given parameters, and updates the distributions of the parameters as dictated by the model and the data.

Parameters:
  • group_data (list of numbers, NumPy array, or Pandas Series.) – Data of the group to be analyzed.
  • ref_val (float) – The reference value to be compared against. This affects the plots and the effect size calculations. Default: 0.
  • n_samples (int) – Number of samples per chain to be drawn for the analysis. (The number of chains depends on the number of CPU cores, but is at least 2.) Default: 2000.
  • **kwargs

    Keyword arguments are passed to pymc3.sample. For example, number of tuning samples can be increased to 2000 (from the default 1000) by:

    best.analyze_one(group_data, tune=2000)
    
Returns:

An object that contains all the posterior samples from the model.

Return type:

BestResultsOne

Notes

The first call of this function takes about 2 minutes extra, in order to compile the model and speed up later calls.

Afterwards, performing a two-group analysis takes about 20 seconds on a 2015 MacBook, both with 20 and 1000 data points.

best.analyze_two(group1_data, group2_data, n_samples: int = 2000, **kwargs) → best.model.BestResultsTwo[source]

Analyze the difference between two groups

This analysis takes about a minute, depending on the amount of data. (See the Notes section below.)

This function creates a model with the given parameters, and updates the distributions of the parameters as dictated by the model and the data.

Parameters:
  • group1_data (list of numbers, NumPy array, or Pandas Series.) – Data of the first group analyzed and to be plotted.
  • group2_data (list of numbers, NumPy array, or Pandas Series.) – Data of the second group analyzed and to be plotted.
  • n_samples (int) – Number of samples per chain to be drawn for the analysis. (The number of chains depends on the number of CPU cores, but is at least 2.) Default: 2000.
  • **kwargs

    Keyword arguments are passed to pymc3.sample(). For example, number of tuning samples can be increased to 2000 (from the default 1000) by:

    best.analyze_two(group1_data, group2_data, tune=2000)
    
Returns:

An object that contains all the posterior samples from the model.

Return type:

BestResultsTwo

Notes

The first call of this function takes about 2 minutes extra, in order to compile the model and speed up later calls.

Afterwards, performing a two-group analysis takes:
  • 20 seconds with 45 data points per group, or
  • 90 seconds with 10,000 data points per group.

Don’t be taken aback by the time estimates in the beginning – the sampling process speeds up after the initial few hundred iterations.

(These times were measured on a 2015 MacBook.)

These functions return a BestResultsOne or BestResultsTwo, respectively.

Plotting functions

best.plot_posterior(best_results: best.model.BestResults, var_name: str, ax: Optional[matplotlib.axes._axes.Axes] = None, bins: Union[int, list, numpy.ndarray] = 30, stat: str = 'mode', title: Optional[str] = None, label: Optional[str] = None, ref_val: Optional[float] = None, **kwargs) → matplotlib.axes._axes.Axes[source]

Plot a histogram of posterior samples of a variable

Parameters:
  • best_results (BestResults) – The result of the analysis.
  • var_name (string) – The name of the variable to be plotted. Available variable names are described in the Variables section.
  • ax (Matplotlib Axes, optional) – If not None, the Matplotlib Axes instance to be used. Default: create a new axes.
  • bins (int or list or NumPy array) – The number or edges of the bins used for the histogram of the data. If an integer, the number of bins to use. If a sequence, then the edges of the bins, including left edge of the first bin and right edge of the last bin. Default: 30 bins.
  • stat ({'mean', 'mode'}) – Whether to print the mean or the mode of the variable on the plot. Default: ‘mode’.
  • title (string, optional) – Title of the plot. Default: don’t print a title.
  • label (string, optional) – Label of the x axis. Default: don’t print a label.
  • ref_val (float, optional) – If not None, print a vertical line at this reference value (typically zero). Default: None (don’t print a reference value)
  • **kwargs (dict) – All other keyword arguments are passed to plt.hist.
Returns:

The Axes object containing the plot. Using this return value, the plot can be customized afterwards – for details, see the documentation of the Matplotlib Axes API.

Return type:

Matplotlib Axes

Examples

To plot a histogram of the samples of the mean of the first group in avocado green:

>>> import matplotlib as plt
>>> ax = best.plot_posterior(best_out,
...                          'Group 1 mean',
...                          color='avocado')
>>> plt.show()
best.plot_data_and_prediction(best_results: best.model.BestResults, group_id: int = 1, ax: matplotlib.axes._axes.Axes = None, bins: Union[int, list, numpy.ndarray] = 30, title: Optional[str] = None, hist_kwargs: dict = {}, prediction_kwargs: dict = {}) → matplotlib.axes._axes.Axes[source]

Plot samples of predictive distributions and a histogram of the data.

This plot can be used as a posterior predictive check, to examine how well the model predictions fit the observed data.

Parameters:
  • best_results – The result of the analysis.
  • group_id ({1, 2}) – Which group to plot (1 or 2).
  • ax (Matplotlib Axes, optional) – If not None, the Matplotlib Axes instance to be used. Default: create a new plot.
  • title (string, optional.) – Title of the plot. Default: no plot title.
  • bins (int or list or NumPy array.) – The number or edges of the bins used for the histogram of the data. If an integer, the number of bins to use. If a sequence, then the edges of the bins, including left edge of the first bin and right edge of the last bin. Default: 30 bins.
  • hist_kwargs (dict) – The keyword arguments to be passed to plt.hist for the group data.
  • prediction_kwargs (dict) – The keyword arguments to be passed to plt.plot for the posterior predictive curves.
Returns:

The Axes object containing the plot. Using this return value, the plot can be customized afterwards – for details, see the documentation of the Matplotlib Axes API.

Return type:

Matplotlib Axes

Examples

To print the data of the second group, add a hatch to the histogram, and set the limits of the x axis to 85 and 115:

>>> import matplotlib as plt
>>> ax = best.plot_data_and_prediction(
...         best_out,
...         2,
...         hist_kwargs={'hatch':'...'}
... )
>>> ax.set_xlim(85, 115)
>>> plt.show()

Notes

You can move the histogram in front of the predictive curves by passing hist_kwargs={'zorder': 10} as an argument, or completely behind the curves with hist_kwargs={'zorder': 0}.

If the plot is large enough, it is suggested to put a legend on it, by calling ax.legend() afterwards.

best.plot_all_one(best_results: best.model.BestResultsOne, bins: int = 30, group_name: Optional[str] = None) → matplotlib.figure.Figure[source]

Plot posteriors of every parameter and observation of a two-group analysis.

Parameters:
  • best_results (BestResultsOne) – The result of the analysis.
  • bins (int) – The number of bins to be used for the histograms. Default: 30.
  • group_name (string, optional) – If not None, group name to be used in the title, e.g. if group_name is "eTRF day 5" then the plot for the mean is titled “eTRF day 5 mean”. If None, then group name is omitted from the titles, resulting in e.g. “Mean”. Default: None.
Returns:

The created figure. (The separate plots can be accessed via fig.axes, where fig is the return value of this function.)

Return type:

plt.Figure

Notes

This section explains when is the mean or the mode printed.

best.plot_all_two(best_results: best.model.BestResultsTwo, bins: int = 30, group1_name: str = 'Group 1', group2_name: str = 'Group 2') → matplotlib.figure.Figure[source]

Plot posteriors of every parameter and observation of a two-group analysis.

Parameters:
  • best_results (BestResultsTwo) – The result of the analysis.
  • bins (int) – The number of bins to be used for the histograms. Default: 30.
  • group1_name (string) – Name of the first group, to be used in the titles. Default: “Group 1”.
  • group2_name (string) – Name of the second group, to be used in the titles. Default: “Group 2”.
Returns:

The created figure. (The separate plots can be accessed via fig.axes, where fig is the return value of this function.)

Return type:

plt.Figure

Notes

This section explains when is the mean or the mode printed.

best.plot_all(best_results: best.model.BestResults, *args, **kwargs) → matplotlib.figure.Figure[source]

Plot posteriors of every parameter and observation of an analysis.

Depending on the type of best_results, this call is equivalent to calling plot_all_one() or plot_all_two().

Classes for analysis results

class best.BestResultsOne(model: best.model.BestModelOne, trace: <sphinx.ext.autodoc.importer._MockObject object at 0x7fdbef796bb0>)[source]

Results of a two-group analysis; subclass of BestResults

class best.BestResultsTwo(model: best.model.BestModelTwo, trace: <sphinx.ext.autodoc.importer._MockObject object at 0x7fdbef796bb0>)[source]

Results of a two-group analysis; subclass of BestResults

observed_data(group_id)[source]

Return the observed data as a NumPy array

(This method is accessible primarily for internal purposes.)

Both are subclasses of BestResults, meaning the following methods are available for BestResultsOne or BestResultsTwo objects.

class best.BestResults(model: best.model.BestModel, trace: <sphinx.ext.autodoc.importer._MockObject object at 0x7fdbef796bb0>)[source]

Results of an analysis

hdi(var_name: str, credible_mass: float = 0.95)[source]

Calculate the highest posterior density interval (HDI)

This function calculates a credible interval which contains the credible_mass most likely values of the parameter, given the data. Also known as an HPD interval.

Parameters:
  • var_name (str) – Name of variable.
  • credible_mass (float) – The HDI will cover credible_mass * 100% of the probability mass. Default: 0.95, i.e. a 95% HDI.
Returns:

The endpoints of the HPD

Return type:

(float, float)

observed_data(group_id: int)[source]

Return the observed data as a NumPy array

(This method is accessible primarily for internal purposes.)

posterior_mode(var_name: str)[source]

Calculate the posterior mode of a variable

Parameters:var_name (string) – The name of the variable whose posterior mode is to be calculated.
Returns:The posterior mode.
Return type:float
posterior_prob(var_name: str, low: float = -inf, high: float = inf)[source]

Calculate the posterior probability that a variable is in a given interval

The return value approximates the following probability:

\[\text{Pr}(\textit{low} < \theta_{\textit{var_name}} < \textit{high} | y_1, y_2)\]

One-sided intervals can be specified by using only the low or high argument, for example, to calculate the probability that the the mean of the first group is larger than that of the second one:

best_result.posterior_prob('Difference of means', low=0)
Parameters:
  • var_name (str) – Name of variable.
  • low (float, optional) – Lower limit of the interval. Default: \(-\infty\) (no lower limit)
  • high (float, optional) – Upper limit of the interval. Default: \(\infty\) (no upper limit)
Returns:

Posterior probability that the variable is in the given interval.

Return type:

float

Notes

If p is the result and S is the total number of samples, then the standard deviation of the result is \(\sqrt{p(1-p)/S}\) (see BDA3, p. 267). For example, with 2000 samples, the errors for some returned probabilities are

  • 0.01 ± 0.002,
  • 0.1 ± 0.007,
  • 0.2 ± 0.009,
  • 0.5 ± 0.011,

meaning the answer is accurate for most practical purposes.

summary(credible_mass: float = 0.95)[source]

Return summary statistics of the results

Parameters:credible_mass (float) – The highest posterior density intervals in the summary will cover credible_mass * 100% of the probability mass. For example, credible_mass=0.95 results in 95% credible intervals. Default: 0.95.
trace

The collection of posterior samples

See the relevant PyMC3 documentation for details.

Lower-level classes

class best.BestModel[source]

Base class for BEST models

model

The underlying PyMC3 Model object

(This property is accessible primarily for internal purposes.)

observed_data(group_id: int)[source]

Return the observed data as a NumPy array

(This method is accessible primarily for internal purposes.)

sample(n_samples: int, **kwargs) → <sphinx.ext.autodoc.importer._MockObject object at 0x7fdbef796bb0>[source]

Draw posterior samples from the model

(This method is accessible primarily for internal purposes.)

version

Version of the model specification

For details, see the page Model Version History.

class best.BestModelOne(y, ref_val)[source]

Model for a single-group analysis; subclass of BestModel

class best.BestModelTwo(y1, y2)[source]

Model for a two-group analysis; subclass of BestModel