API reference¶
Sample size calculation¶
cca_sample_size (X, Y[, rs, criterion, …]) |
Suggest sample size for CCA. |
pls_sample_size (X, Y[, ax, ay, rs, …]) |
Suggest sample size for PLS. |
pearson_sample_size ([rs, criterion, …]) |
Calculate required sample sizes for accurate estimation of Pearson correlation. |
sample_size.linear_model.cca_req_corr (X, Y, …) |
Determines the minimum required true correlation to achieve power and error levels. |
sample_size.linear_model.pls_req_corr (X, Y, …) |
Determines the minimum required true correlation to achieve power and error levels. |
Estimators¶
estimators.SVDPLS ([n_components, …]) |
Partial Least Squares estimators based on singular value decomposition. |
estimators.SVDCCA ([n_components, …]) |
Canonical Correlation Analysis estimator based on singular value decomposition. |
estimators.NIPALSPLS ([n_components, scale, …]) |
Identical to sklearn.cross_decomposition.PLSCanonical, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA: |
estimators.NIPALSCCA ([n_components, scale, …]) |
Identical to sklearn.cross_decomposition.CCA, except that fit creates additional attributes for compatibility with SVDPLS and SVDCCA: |
estimators.SparseCCA |
Synthetic data generation¶
generative_model.setup_model (model[, …]) |
Generate a joint covariance matrix for X and Y. |
generative_model.generate_data (Sigma, px, n) |
Generate synthetic data for a given model. |
Analysis of CCA/PLS results¶
sample_analysis.analyzers.analyze_dataset (…) |
Analyze a given dataset with a given estimator |
sample_analysis.analyzers.analyze_resampled (…) |
Analyze a given dataset and resampled versions of it with a given estimator. |
sample_analysis.analyzers.analyze_subsampled (…) |
Analyze subsampled versions of a dataset with a given estimator. |
sample_analysis.analyzers.analyze_model (…) |
Synthetic datasets drawn from a model are analyzed with a given stimator. |
sample_analysis.analyzers.analyze_model_parameters (model) |
Parameter-dependent models are set up and resulting synthetic datasets are analyzed. |
Analysis add-ons¶
The functions in sample_analysis.analyzers
only fit an estimator and return association strengths, weights and
loadings. Additional analyses can be specified in the form of add-on functions. The following functions are provided,
and arbitrary custom ones can be used as long as they have the same function signature.
sample_analysis.addon.remove_weights_loadings (…) |
Removes weights and loadings from results dataset to save storage space. |
sample_analysis.addon.remove_cv_weights (…) |
Removes x_weights_cv and y_weights_cv from results dataset to save storage space. |
sample_analysis.addon.weights_true_cossim (…) |
Calculates cosine-distance between estimated and true weights. |
sample_analysis.addon.scores_true_spearman (…) |
Calculates Spearman correlations between estimated and true test scores. |
sample_analysis.addon.loadings_true_pearson (…) |
Calculates Pearson correlations between estimated and true test loadings. |
sample_analysis.addon.test_scores (estr, X, …) |
Calculates test scores. |
sample_analysis.addon.remove_test_scores (…) |
Removes x_test_scores and y_test_scores from results . |
sample_analysis.addon.assoc_test (estr, X, Y, …) |
Calculates Pearson correlations between test scores. |
sample_analysis.addon.weights_pc_cossim (…) |
Calculates cosine-similarities of principal component axes of X and Y with corresponding weights. |
sample_analysis.addon.sparseCCA_penalties (…) |
Store penalties of a fitted SparseCCA estimator. |
sample_analysis.addon.cv (estr, X, Y, Xorg, …) |
Calculates cross-validated outcome metrics. |
Some of these add-ons require some help to set them up for work:
sample_analysis.addon.mk_scorers_for_cv ([…]) |
Create scorers to use with cv() . |
sample_analysis.addon.mk_test_statistics_scores (…) |
Calculate scores for test subjects. |
Analyses, that look into relations across datasets, and therefore require outcomes of more than a given current dataset to work, can be specified as postprocessors:
sample_analysis.postproc.power (res[, alpha]) |
Calculate power |
sample_analysis.postproc.remove_between_assocs_perm (res) |
Removes between_assocs_perm from results dataset |
sample_analysis.postproc.weights_pairwise_cossim_stats (res) |
Calculate cosine similarity between weights for all pairs of repetitions. |
sample_analysis.postproc.scores_pairwise_spearmansim_stats (res) |
Calculate cosine similarity between weights for all pairs of repetitions. |
sample_analysis.postproc.remove_weights_loadings (res) |
Removes weights and loadings from result dataset. |
sample_analysis.postproc.remove_test_scores (res) |
Removes test scores from result dataset. |
Finally, there are a number of analysis building blocks that we found useful:
sample_analysis.macros.calc_p_value (estr, X, Y) |
Calculate permutation-based p-value. |
sample_analysis.macros.analyze_subsampled_and_resampled (…) |
Analyzes the given data with the given estimator. |
Model selection¶
model_selection.max_min_detector (X, Y, p_max) |
Hypothesis-test based method to jointly determine number of PCA and between-set components. |
model_selection.n_components_to_explain_variance (…) |
Given a covariance matrix, find the number of components necessary to explain at least variance_threshold variance. |
Plotting¶
plot.mean_metric_curve (metric[, rs, …]) |
Plots mean curves for given rs as a function of n_per_ftr . |
plot.heatmap_n_req (n_req[, clabel]) |
Plots a heatmap of required number of samples as a function of number of features and true correlation. |
plot.polar_hist (angles[, bins, mark_mean]) |
Plot a polar histogram. |
Data¶
Preprocessing¶
data.preprocessing.preproc_smith (fc, sm[, …]) |
Data preprocessing pipeline from Smith et al. |
Handling of included data files¶
data.loaders.set_data_home (data_home) |
Set directory in which outcome data is stored |
data.load_outcomes (model[, estr, tag, …]) |
Load previously generated outcome data. |
data.generate_example_dataset (model[, px, …]) |
Convenience function returning an example dataset for use with CCA or PLS. |
data.print_ds_stats (ds) |
Print outcome dataset statistics. |
Utility functions¶
util.rank_based_inverse_normal_trafo (x[, c]) |
Rank-based inverse normal transformation. |
util.pc_spectrum_decay_constant ([X, …]) |
Estimate powerlaw decay constant. |