gemmr.sample_analysis.analyzers.analyze_subsampled
- gemmr.sample_analysis.analyzers.analyze_subsampled(estr, X, Y, Xorig=None, Yorig=None, x_align_ref=None, y_align_ref=None, addons=(), ns=(), n_rep=10, n_perm=100, n_test=0, postprocessors=(), n_jobs=1, show_progress=True, random_state=None, fit_params=None, overlapping_subjects=True, **kwargs)
Analyze subsampled versions of a dataset with a given estimator.
- Parameters:
estr (sklearn-style estimator) – for performing CCA or PLS. Must have method
fit
and (after fitting) attributesassocs_
,x_rotations_
,y_rotations_
,x_scores_
,y_scores_
X (np.ndarray (n_samples, n_features)) – dataset X
Y (np.ndarray (n_samples, n_features)) – dataset Y
Xorig (
None
or np.ndarray (n_samples, n_orig_features)) – can beNone
. Allows to provide an alternative set of X features for calculating loadings. I.e. an implicit assumption is that the rows inX
andXorig
correspond to the same samples (subjects).Yorig (
None
or np.ndarray (n_samples, n_orig_features)) – can beNone
. Allows to provide an alternative set of Y features for calculating loadings. I.e. an implicit assumption is that the rows inY
andYorig
correspond to the same samples (subjects).x_align_ref ((n_features,)) – after fitting, the sign of X weights is chosen such that the cosine-distance between fitted X weights and
x_align_ref
is positivey_align_ref ((n_features,)) – after fitting, the sign of Y weights is chosen such that the cosine-distance between fitted Y weights and
y_align_ref
is positiveaddons (list-like of add-on functions) –
After fitting the estimator and saving association strengths, weights and loadings in
results
additional analyses can be performed with these functions. They are called in the given order, and must have the signatureaddana_fun(estr, X, Y, Xorig, Yorig, x_align_ref, y_align_ref, results, **kwargs)
and are expected to save their respective outcome features
results
. Various such functions are provided in modulesample_analysis_addons
ns (list-like of int) – subsamples of these sizes are used
n_rep (int) – number of times a subsample of a given size is drawn
n_perm (int) – each subsample is permuted
n_perm
times to generate a null-distribution of outcome quantitiesn_test (int or 'auto') – number of subjects to use as test set.
max(ns) + n_test
must be <=n_samples
. Ifn_test == 'auto'
thenn_test = n_samples - max(ns)
will be used.postprocessors (list-like of functions) – functions are called after the final dataset has been concatenated and take that xr.Dataset as only argument
n_jobs (int or None) – number of parallel jobs (see
joblib.Parallel
)show_progress (bool) – whether to show progress bar
random_state (
None
, int or random number generator instance) – used to generate random numbersfit_params (dict) – keyword-arguments for estr.fit
overlapping_subjects (bool) – if
True
allow overlapping subjects in different repetitions. IfFalse
, this implies thatmax(ns)
must be smaller than(len(X) - n_test) / n_rep
.kwargs (dict) – forwarded to additional analysis functions
- Returns:
results – containing data variables for outcome features generated by analyses
- Return type:
xr.Dataset