API reference

Sample size calculation

`cca_sample_size`(X, Y[, ax, ay, rs, ...])	Suggest sample size for CCA.
`pls_sample_size`(X, Y[, ax, ay, rs, ...])	Suggest sample size for PLS.
`pearson_sample_size`([rs, criterion, ...])	Calculate required sample sizes for accurate estimation of Pearson correlation.
`sample_size.linear_model.cca_req_corr`(X, Y, ...)	Determines the minimum required true correlation to achieve power and error levels.
`sample_size.linear_model.pls_req_corr`(X, Y, ...)	Determines the minimum required true correlation to achieve power and error levels.

Estimators

`estimators.SVDPLS`([n_components, ...])	Partial Least Squares estimators based on singular value decomposition.
`estimators.SVDCCA`([n_components, ...])	Canonical Correlation Analysis estimator based on singular value decomposition.
`estimators.NIPALSPLS`([n_components, scale, ...])	Identical to sklearn.cross_decomposition.PLSCanonical, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:
`estimators.NIPALSCCA`([n_components, scale, ...])	Identical to sklearn.cross_decomposition.CCA, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:

Synthetic data generation

`generative_model.GEMMR`(model, args, *kwargs)	Generate a joint covariance matrix for X and Y.
`generative_model.generate_data`(Sigma, px, n)	Generate synthetic data for a given model.

Analysis of CCA/PLS results

`sample_analysis.analyzers.analyze_dataset`(...)	Analyze a given dataset with a given estimator
`sample_analysis.analyzers.analyze_resampled`(...)	Analyze a given dataset and resampled versions of it with a given estimator.
`sample_analysis.analyzers.analyze_subsampled`(...)	Analyze subsampled versions of a dataset with a given estimator.
`sample_analysis.analyzers.analyze_model`(gm, ...)
`sample_analysis.analyzers.analyze_model_parameters`(model)	Parameter-dependent models are set up and resulting synthetic datasets are analyzed.

Analysis add-ons

The functions in sample_analysis.analyzers only fit an estimator and return association strengths, weights and loadings. Additional analyses can be specified in the form of add-on functions. The following functions are provided, and arbitrary custom ones can be used as long as they have the same function signature.

`sample_analysis.addon.remove_weights_loadings`(...)	Removes weights and loadings from `results` dataset to save storage space.
`sample_analysis.addon.remove_cv_weights_loadings`(...)	Removes `x_weights_cv`, `y_weights_cv`, `x_loadings_cv` and
`sample_analysis.addon.weights_true_cossim`(...)	Calculates cosine-distance between estimated and true weights.
`sample_analysis.addon.test_scores_true_spearman`(...)	Calculates Spearman correlations between estimated and true test scores.
`sample_analysis.addon.test_scores_true_pearson`(...)	Calculates Pearson correlations between estimated and true test scores.
`sample_analysis.addon.loadings_true_pearson`(...)	Calculates Pearson correlations between estimated and true loadings (loadings with respect to possibly transformed variables, i.e. those in columns of X, Y (not Xorig, Yorig).
`sample_analysis.addon.test_scores`(estr, X, ...)	Calculates test scores.
`sample_analysis.addon.remove_test_scores`(...)	Removes `x_test_scores` and `y_test_scores` from `results`.
`sample_analysis.addon.assoc_test`(estr, X, Y, ...)	Calculates Pearson correlations between test scores.
`sample_analysis.addon.weights_pc_cossim`(...)	Calculates cosine-similarities of principal component axes of X and Y with corresponding weights.
`sample_analysis.addon.sparseCCA_penalties`(...)	Store penalties of a fitted SparseCCA estimator.
`sample_analysis.addon.cv`(estr, X, Y, Xorg, ...)	Calculates cross-validated outcome metrics.

Some of these add-ons require some help to set them up for work:

`sample_analysis.addon.mk_scorers_for_cv`([...])	Create scorers to use with `cv()`.
`sample_analysis.addon.mk_test_statistics_scores`(...)	Calculate scores for test subjects.

Analyses, that look into relations across datasets, and therefore require outcomes of more than a given current dataset to work, can be specified as postprocessors:

`sample_analysis.postproc.power`(res[, alpha])	Calculate power
`sample_analysis.postproc.remove_between_assocs_perm`(res)	Removes between_assocs_perm from results dataset
`sample_analysis.postproc.weights_pairwise_cossim_stats`(res)	Calculate cosine similarity between weights for all pairs of repetitions.
`sample_analysis.postproc.scores_pairwise_spearmansim_stats`(res)	Calculate cosine similarity between weights for all pairs of repetitions.
`sample_analysis.postproc.remove_weights_loadings`(res)	Removes weights and loadings from result dataset.
`sample_analysis.postproc.remove_test_scores`(res)	Removes test scores from result dataset.

Finally, there are a number of analysis building blocks that we found useful:

`sample_analysis.macros.calc_p_value`(estr, X, Y)	Calculate permutation-based p-value.
`sample_analysis.macros.analyze_subsampled_and_resampled`(...)	Analyzes the given data with the given estimator.
`sample_analysis.macros.pairwise_weight_cosine_similarity`(ds)	Calculates statistics of the weight-similarities from pairs of synthetic datasets.

Model selection

`model_selection.max_min_detector`(X, Y, p_max)	Hypothesis-test based method to jointly determine number of PCA and between-set components.
`model_selection.n_components_to_explain_variance`(...)	Given a covariance matrix, find the number of components necessary to explain at least variance_threshold variance.

Plotting

`plot.mean_metric_curve`(metric[, rs, ...])	Plots mean curves for given `rs` as a function of `n_per_ftr`.
`plot.heatmap_n_req`(n_req[, clabel])	Plots a heatmap of required number of samples as a function of number of features and true correlation.
`plot.polar_hist`(angles[, bins, mark_mean])	Plot a polar histogram.

Data

Preprocessing

data.preprocessing.preproc_smith(fc, sm[, ...])

Data preprocessing pipeline from Smith et al. (2015).

Handling of included data files

`data.loaders.set_data_home`(data_home)	Set directory in which outcome data is stored
`data.load_outcomes`(dsid[, model, data_home, ...])	Load previously generated outcome data.
`data.generate_example_dataset`(model[, px, ...])	Convenience function returning an example dataset for use with CCA or PLS.
`data.print_ds_stats`(ds[, prefix])	Print outcome dataset statistics.

Utility functions

`util.rank_based_inverse_normal_trafo`(x[, c])	Rank-based inverse normal transformation.
`util.pc_spectrum_decay_constant`([X, ...])	Estimate powerlaw decay constant.