API reference

Sample size calculation

cca_sample_size(X, Y[, ax, ay, rs, ...])

Suggest sample size for CCA.

pls_sample_size(X, Y[, ax, ay, rs, ...])

Suggest sample size for PLS.

pearson_sample_size([rs, criterion, ...])

Calculate required sample sizes for accurate estimation of Pearson correlation.

sample_size.linear_model.cca_req_corr(X, Y, ...)

Determines the minimum required true correlation to achieve power and error levels.

sample_size.linear_model.pls_req_corr(X, Y, ...)

Determines the minimum required true correlation to achieve power and error levels.

Estimators

estimators.SVDPLS([n_components, ...])

Partial Least Squares estimators based on singular value decomposition.

estimators.SVDCCA([n_components, ...])

Canonical Correlation Analysis estimator based on singular value decomposition.

estimators.NIPALSPLS([n_components, scale, ...])

Identical to sklearn.cross_decomposition.PLSCanonical, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:

estimators.NIPALSCCA([n_components, scale, ...])

Identical to sklearn.cross_decomposition.CCA, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA:

Synthetic data generation

generative_model.GEMMR(model, *args, **kwargs)

Generate a joint covariance matrix for X and Y.

generative_model.generate_data(Sigma, px, n)

Generate synthetic data for a given model.

Analysis of CCA/PLS results

sample_analysis.analyzers.analyze_dataset(...)

Analyze a given dataset with a given estimator

sample_analysis.analyzers.analyze_resampled(...)

Analyze a given dataset and resampled versions of it with a given estimator.

sample_analysis.analyzers.analyze_subsampled(...)

Analyze subsampled versions of a dataset with a given estimator.

sample_analysis.analyzers.analyze_model(gm, ...)

sample_analysis.analyzers.analyze_model_parameters(model)

Parameter-dependent models are set up and resulting synthetic datasets are analyzed.

Analysis add-ons

The functions in sample_analysis.analyzers only fit an estimator and return association strengths, weights and loadings. Additional analyses can be specified in the form of add-on functions. The following functions are provided, and arbitrary custom ones can be used as long as they have the same function signature.

sample_analysis.addon.remove_weights_loadings(...)

Removes weights and loadings from results dataset to save storage space.

sample_analysis.addon.remove_cv_weights_loadings(...)

Removes x_weights_cv, y_weights_cv, x_loadings_cv and

sample_analysis.addon.weights_true_cossim(...)

Calculates cosine-distance between estimated and true weights.

sample_analysis.addon.test_scores_true_spearman(...)

Calculates Spearman correlations between estimated and true test scores.

sample_analysis.addon.test_scores_true_pearson(...)

Calculates Pearson correlations between estimated and true test scores.

sample_analysis.addon.loadings_true_pearson(...)

Calculates Pearson correlations between estimated and true loadings (loadings with respect to possibly transformed variables, i.e. those in columns of X, Y (not Xorig, Yorig).

sample_analysis.addon.test_scores(estr, X, ...)

Calculates test scores.

sample_analysis.addon.remove_test_scores(...)

Removes x_test_scores and y_test_scores from results.

sample_analysis.addon.assoc_test(estr, X, Y, ...)

Calculates Pearson correlations between test scores.

sample_analysis.addon.weights_pc_cossim(...)

Calculates cosine-similarities of principal component axes of X and Y with corresponding weights.

sample_analysis.addon.sparseCCA_penalties(...)

Store penalties of a fitted SparseCCA estimator.

sample_analysis.addon.cv(estr, X, Y, Xorg, ...)

Calculates cross-validated outcome metrics.

Some of these add-ons require some help to set them up for work:

sample_analysis.addon.mk_scorers_for_cv([...])

Create scorers to use with cv().

sample_analysis.addon.mk_test_statistics_scores(...)

Calculate scores for test subjects.

Analyses, that look into relations across datasets, and therefore require outcomes of more than a given current dataset to work, can be specified as postprocessors:

sample_analysis.postproc.power(res[, alpha])

Calculate power

sample_analysis.postproc.remove_between_assocs_perm(res)

Removes between_assocs_perm from results dataset

sample_analysis.postproc.weights_pairwise_cossim_stats(res)

Calculate cosine similarity between weights for all pairs of repetitions.

sample_analysis.postproc.scores_pairwise_spearmansim_stats(res)

Calculate cosine similarity between weights for all pairs of repetitions.

sample_analysis.postproc.remove_weights_loadings(res)

Removes weights and loadings from result dataset.

sample_analysis.postproc.remove_test_scores(res)

Removes test scores from result dataset.

Finally, there are a number of analysis building blocks that we found useful:

sample_analysis.macros.calc_p_value(estr, X, Y)

Calculate permutation-based p-value.

sample_analysis.macros.analyze_subsampled_and_resampled(...)

Analyzes the given data with the given estimator.

sample_analysis.macros.pairwise_weight_cosine_similarity(ds)

Calculates statistics of the weight-similarities from pairs of synthetic datasets.

Model selection

model_selection.max_min_detector(X, Y, p_max)

Hypothesis-test based method to jointly determine number of PCA and between-set components.

model_selection.n_components_to_explain_variance(...)

Given a covariance matrix, find the number of components necessary to explain at least variance_threshold variance.

Plotting

plot.mean_metric_curve(metric[, rs, ...])

Plots mean curves for given rs as a function of n_per_ftr.

plot.heatmap_n_req(n_req[, clabel])

Plots a heatmap of required number of samples as a function of number of features and true correlation.

plot.polar_hist(angles[, bins, mark_mean])

Plot a polar histogram.

Data

Preprocessing

data.preprocessing.preproc_smith(fc, sm[, ...])

Data preprocessing pipeline from Smith et al. (2015).

Handling of included data files

data.loaders.set_data_home(data_home)

Set directory in which outcome data is stored

data.load_outcomes(dsid[, model, data_home, ...])

Load previously generated outcome data.

data.generate_example_dataset(model[, px, ...])

Convenience function returning an example dataset for use with CCA or PLS.

data.print_ds_stats(ds[, prefix])

Print outcome dataset statistics.

Utility functions

util.rank_based_inverse_normal_trafo(x[, c])

Rank-based inverse normal transformation.

util.pc_spectrum_decay_constant([X, ...])

Estimate powerlaw decay constant.