gemmr.sample_analysis.analyzers.analyze_subsampled¶

gemmr.sample_analysis.analyzers.analyze_subsampled(estr, X, Y, Xorig=None, Yorig=None, x_align_ref=None, y_align_ref=None, addons=(), ns=(), n_rep=10, n_perm=100, n_test=0, postprocessors=(), n_jobs=1, show_progress=True, random_state=None, **kwargs)¶

Analyze subsampled versions of a dataset with a given estimator.

Parameters:	estr (sklearn-style estimator) – for performing CCA or PLS. Must have method `fit` and (after fitting) attributes `assocs_`, `x_rotations_`, `y_rotations_`, `x_scores_`, `y_scores_` X (np.ndarray (n_samples, n_features)) – dataset X Y (np.ndarray (n_samples, n_features)) – dataset Y Xorig (`None` or np.ndarray (n_samples, n_orig_features)) – if `None` set to `X`. Allows to provide an alternative set of X features for calculating loadings. I.e. an implicit assumption is that the rows in `X` and `Xorig` correspond to the same samples (subjects). Yorig (`None` or np.ndarray (n_samples, n_orig_features)) – if `None` set to `Y`. Allows to provide an alternative set of Y features for calculating loadings. I.e. an implicit assumption is that the rows in `Y` and `Yorig` correspond to the same samples (subjects). x_align_ref ((n_features,)) – after fitting, the sign of X weights is chosen such that the cosine-distance between fitted X weights and `x_align_ref` is positive y_align_ref ((n_features,)) – after fitting, the sign of Y weights is chosen such that the cosine-distance between fitted Y weights and `y_align_ref` is positive addons (list-like of add-on functions) – After fitting the estimator and saving association strengths, weights and loadings in `results` additional analyses can be performed with these functions. They are called in the given order, and must have the signature addana_fun(estr, X, Y, Xorig, Yorig, x_align_ref, y_align_ref, results, *kwargs) and are expected to save their respective outcome features `results`. Various such functions are provided in module `sample_analysis_addons` ns (list-like of int) – subsamples of these sizes are used n_rep* (int) – number of times a subsample of a given size is drawn n_perm (int) – each subsample is permuted `n_perm` times to generate a null-distribution of outcome quantities n_test (int) – number of subjects to use as test set. `max(ns) + n_test` must be <= `n_samples` postprocessors (list-like of functions) – functions are called after the final dataset has been concatenated and take that xr.Dataset as only argument n_jobs (int or None) – number of parallel jobs (see `joblib.Parallel`) show_progress (bool) – whether to show progress bar random_state (`None`, int or random number generator instance) – used to generate random numbers kwargs (dict) – forwarded to additional analysis functions
Returns:	results – containing data variables for outcome features generated by analyses
Return type:	xr.Dataset