API reference

Sample size calculation

cca_sample_size(X, Y[, ax, ay, rs, ...])

Suggest sample size for CCA.

pls_sample_size(X, Y[, ax, ay, rs, ...])

Suggest sample size for PLS.

pearson_sample_size([rs, criterion, ...])

Calculate required sample sizes for accurate estimation of Pearson correlation.

sample_size.linear_model.cca_req_corr(X, Y, ...)

Determines the minimum required true correlation to achieve power and error levels.

sample_size.linear_model.pls_req_corr(X, Y, ...)

Determines the minimum required true correlation to achieve power and error levels.


estimators.SVDPLS([n_components, ...])

Partial Least Squares estimators based on singular value decomposition.

estimators.SVDCCA([n_components, ...])

Canonical Correlation Analysis estimator based on singular value decomposition.

estimators.NIPALSPLS([n_components, scale, ...])

Identical to sklearn.cross_decomposition.PLSCanonical, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:

estimators.NIPALSCCA([n_components, scale, ...])

Identical to sklearn.cross_decomposition.CCA, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:

Synthetic data generation

generative_model.GEMMR(model, *args, **kwargs)

Generate a joint covariance matrix for X and Y.

generative_model.generate_data(Sigma, px, n)

Generate synthetic data for a given model.

Analysis of CCA/PLS results


Analyze a given dataset with a given estimator


Analyze a given dataset and resampled versions of it with a given estimator.


Analyze subsampled versions of a dataset with a given estimator.

sample_analysis.analyzers.analyze_model(gm, ...)


Parameter-dependent models are set up and resulting synthetic datasets are analyzed.

Analysis add-ons

The functions in sample_analysis.analyzers only fit an estimator and return association strengths, weights and loadings. Additional analyses can be specified in the form of add-on functions. The following functions are provided, and arbitrary custom ones can be used as long as they have the same function signature.


Removes weights and loadings from results dataset to save storage space.


Removes x_weights_cv, y_weights_cv, x_loadings_cv and


Calculates cosine-distance between estimated and true weights.


Calculates Spearman correlations between estimated and true test scores.


Calculates Pearson correlations between estimated and true test scores.


Calculates Pearson correlations between estimated and true loadings (loadings with respect to possibly transformed variables, i.e. those in columns of X, Y (not Xorig, Yorig).

sample_analysis.addon.test_scores(estr, X, ...)

Calculates test scores.


Removes x_test_scores and y_test_scores from results.

sample_analysis.addon.assoc_test(estr, X, Y, ...)

Calculates Pearson correlations between test scores.


Calculates cosine-similarities of principal component axes of X and Y with corresponding weights.


Store penalties of a fitted SparseCCA estimator.

sample_analysis.addon.cv(estr, X, Y, Xorg, ...)

Calculates cross-validated outcome metrics.

Some of these add-ons require some help to set them up for work:


Create scorers to use with cv().


Calculate scores for test subjects.

Analyses, that look into relations across datasets, and therefore require outcomes of more than a given current dataset to work, can be specified as postprocessors:

sample_analysis.postproc.power(res[, alpha])

Calculate power


Removes between_assocs_perm from results dataset


Calculate cosine similarity between weights for all pairs of repetitions.


Calculate cosine similarity between weights for all pairs of repetitions.


Removes weights and loadings from result dataset.


Removes test scores from result dataset.

Finally, there are a number of analysis building blocks that we found useful:

sample_analysis.macros.calc_p_value(estr, X, Y)

Calculate permutation-based p-value.


Analyzes the given data with the given estimator.


Calculates statistics of the weight-similarities from pairs of synthetic datasets.

Model selection

model_selection.max_min_detector(X, Y, p_max)

Hypothesis-test based method to jointly determine number of PCA and between-set components.


Given a covariance matrix, find the number of components necessary to explain at least variance_threshold variance.


plot.mean_metric_curve(metric[, rs, ...])

Plots mean curves for given rs as a function of n_per_ftr.

plot.heatmap_n_req(n_req[, clabel])

Plots a heatmap of required number of samples as a function of number of features and true correlation.

plot.polar_hist(angles[, bins, mark_mean])

Plot a polar histogram.



data.preprocessing.preproc_smith(fc, sm[, ...])

Data preprocessing pipeline from Smith et al. (2015).

Handling of included data files


Set directory in which outcome data is stored

data.load_outcomes(dsid[, model, data_home, ...])

Load previously generated outcome data.

data.generate_example_dataset(model[, px, ...])

Convenience function returning an example dataset for use with CCA or PLS.

data.print_ds_stats(ds[, prefix])

Print outcome dataset statistics.

Utility functions

util.rank_based_inverse_normal_trafo(x[, c])

Rank-based inverse normal transformation.

util.pc_spectrum_decay_constant([X, ...])

Estimate powerlaw decay constant.