API reference

Sample size calculation

cca_sample_size(X, Y[, rs, criterion, …]) Suggest sample size for CCA.
pls_sample_size(X, Y[, ax, ay, rs, …]) Suggest sample size for PLS.
pearson_sample_size([rs, criterion, …]) Calculate required sample sizes for accurate estimation of Pearson correlation.
sample_size.linear_model.cca_req_corr(X, Y, …) Determines the minimum required true correlation to achieve power and error levels.
sample_size.linear_model.pls_req_corr(X, Y, …) Determines the minimum required true correlation to achieve power and error levels.

Estimators

estimators.SVDPLS([n_components, …]) Partial Least Squares estimators based on singular value decomposition.
estimators.SVDCCA([n_components, …]) Canonical Correlation Analysis estimator based on singular value decomposition.
estimators.NIPALSPLS([n_components, scale, …]) Identical to sklearn.cross_decomposition.PLSCanonical, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:
estimators.NIPALSCCA([n_components, scale, …]) Identical to sklearn.cross_decomposition.CCA, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:
estimators.SparseCCA

Synthetic data generation

generative_model.setup_model(model[, …]) Generate a joint covariance matrix for X and Y.
generative_model.generate_data(Sigma, px, n) Generate synthetic data for a given model.

Analysis of CCA/PLS results

sample_analysis.analyzers.analyze_dataset(…) Analyze a given dataset with a given estimator
sample_analysis.analyzers.analyze_resampled(…) Analyze a given dataset and resampled versions of it with a given estimator.
sample_analysis.analyzers.analyze_subsampled(…) Analyze subsampled versions of a dataset with a given estimator.
sample_analysis.analyzers.analyze_model(…) Synthetic datasets drawn from a model are analyzed with a given stimator.
sample_analysis.analyzers.analyze_model_parameters(model) Parameter-dependent models are set up and resulting synthetic datasets are analyzed.

Analysis add-ons

The functions in sample_analysis.analyzers only fit an estimator and return association strengths, weights and loadings. Additional analyses can be specified in the form of add-on functions. The following functions are provided, and arbitrary custom ones can be used as long as they have the same function signature.

sample_analysis.addon.remove_weights_loadings(…) Removes weights and loadings from results dataset to save storage space.
sample_analysis.addon.remove_cv_weights(…) Removes x_weights_cv and y_weights_cv from results dataset to save storage space.
sample_analysis.addon.weights_true_cossim(…) Calculates cosine-distance between estimated and true weights.
sample_analysis.addon.scores_true_spearman(…) Calculates Spearman correlations between estimated and true test scores.
sample_analysis.addon.loadings_true_pearson(…) Calculates Pearson correlations between estimated and true test loadings.
sample_analysis.addon.test_scores(estr, X, …) Calculates test scores.
sample_analysis.addon.remove_test_scores(…) Removes x_test_scores and y_test_scores from results.
sample_analysis.addon.assoc_test(estr, X, Y, …) Calculates Pearson correlations between test scores.
sample_analysis.addon.weights_pc_cossim(…) Calculates cosine-similarities of principal component axes of X and Y with corresponding weights.
sample_analysis.addon.sparseCCA_penalties(…) Store penalties of a fitted SparseCCA estimator.
sample_analysis.addon.cv(estr, X, Y, Xorg, …) Calculates cross-validated outcome metrics.

Some of these add-ons require some help to set them up for work:

sample_analysis.addon.mk_scorers_for_cv([…]) Create scorers to use with cv().
sample_analysis.addon.mk_test_statistics_scores(…) Calculate scores for test subjects.

Analyses, that look into relations across datasets, and therefore require outcomes of more than a given current dataset to work, can be specified as postprocessors:

sample_analysis.postproc.power(res[, alpha]) Calculate power
sample_analysis.postproc.remove_between_assocs_perm(res) Removes between_assocs_perm from results dataset
sample_analysis.postproc.weights_pairwise_cossim_stats(res) Calculate cosine similarity between weights for all pairs of repetitions.
sample_analysis.postproc.scores_pairwise_spearmansim_stats(res) Calculate cosine similarity between weights for all pairs of repetitions.
sample_analysis.postproc.remove_weights_loadings(res) Removes weights and loadings from result dataset.
sample_analysis.postproc.remove_test_scores(res) Removes test scores from result dataset.

Finally, there are a number of analysis building blocks that we found useful:

sample_analysis.macros.calc_p_value(estr, X, Y) Calculate permutation-based p-value.
sample_analysis.macros.analyze_subsampled_and_resampled(…) Analyzes the given data with the given estimator.

Model selection

model_selection.max_min_detector(X, Y, p_max) Hypothesis-test based method to jointly determine number of PCA and between-set components.
model_selection.n_components_to_explain_variance(…) Given a covariance matrix, find the number of components necessary to explain at least variance_threshold variance.

Plotting

plot.mean_metric_curve(metric[, rs, …]) Plots mean curves for given rs as a function of n_per_ftr.
plot.heatmap_n_req(n_req[, clabel]) Plots a heatmap of required number of samples as a function of number of features and true correlation.
plot.polar_hist(angles[, bins, mark_mean]) Plot a polar histogram.

Data

Preprocessing

data.preprocessing.preproc_smith(fc, sm[, …]) Data preprocessing pipeline from Smith et al.

Handling of included data files

data.loaders.set_data_home(data_home) Set directory in which outcome data is stored
data.load_outcomes(model[, estr, tag, …]) Load previously generated outcome data.
data.generate_example_dataset(model[, px, …]) Convenience function returning an example dataset for use with CCA or PLS.
data.print_ds_stats(ds) Print outcome dataset statistics.

Utility functions

util.rank_based_inverse_normal_trafo(x[, c]) Rank-based inverse normal transformation.
util.pc_spectrum_decay_constant([X, …]) Estimate powerlaw decay constant.