gemmr.sample_analysis.analyzers.analyze_subsampled

gemmr.sample_analysis.analyzers.analyze_subsampled(estr, X, Y, Xorig=None, Yorig=None, x_align_ref=None, y_align_ref=None, addons=(), ns=(), n_rep=10, n_perm=100, n_test=0, postprocessors=(), n_jobs=1, show_progress=True, random_state=None, fit_params=None, overlapping_subjects=True, **kwargs)

Analyze subsampled versions of a dataset with a given estimator.

Parameters:
  • estr (sklearn-style estimator) – for performing CCA or PLS. Must have method fit and (after fitting) attributes assocs_, x_rotations_, y_rotations_, x_scores_, y_scores_

  • X (np.ndarray (n_samples, n_features)) – dataset X

  • Y (np.ndarray (n_samples, n_features)) – dataset Y

  • Xorig (None or np.ndarray (n_samples, n_orig_features)) – can be None. Allows to provide an alternative set of X features for calculating loadings. I.e. an implicit assumption is that the rows in X and Xorig correspond to the same samples (subjects).

  • Yorig (None or np.ndarray (n_samples, n_orig_features)) – can be None. Allows to provide an alternative set of Y features for calculating loadings. I.e. an implicit assumption is that the rows in Y and Yorig correspond to the same samples (subjects).

  • x_align_ref ((n_features,)) – after fitting, the sign of X weights is chosen such that the cosine-distance between fitted X weights and x_align_ref is positive

  • y_align_ref ((n_features,)) – after fitting, the sign of Y weights is chosen such that the cosine-distance between fitted Y weights and y_align_ref is positive

  • addons (list-like of add-on functions) –

    After fitting the estimator and saving association strengths, weights and loadings in results additional analyses can be performed with these functions. They are called in the given order, and must have the signature

    addana_fun(estr, X, Y, Xorig, Yorig, x_align_ref, y_align_ref,
        results, **kwargs)
    

    and are expected to save their respective outcome features results. Various such functions are provided in module sample_analysis_addons

  • ns (list-like of int) – subsamples of these sizes are used

  • n_rep (int) – number of times a subsample of a given size is drawn

  • n_perm (int) – each subsample is permuted n_perm times to generate a null-distribution of outcome quantities

  • n_test (int or 'auto') – number of subjects to use as test set. max(ns) + n_test must be <= n_samples. If n_test == 'auto' then n_test = n_samples - max(ns) will be used.

  • postprocessors (list-like of functions) – functions are called after the final dataset has been concatenated and take that xr.Dataset as only argument

  • n_jobs (int or None) – number of parallel jobs (see joblib.Parallel)

  • show_progress (bool) – whether to show progress bar

  • random_state (None, int or random number generator instance) – used to generate random numbers

  • fit_params (dict) – keyword-arguments for estr.fit

  • overlapping_subjects (bool) – if True allow overlapping subjects in different repetitions. If False, this implies that max(ns) must be smaller than (len(X) - n_test) / n_rep.

  • kwargs (dict) – forwarded to additional analysis functions

Returns:

results – containing data variables for outcome features generated by analyses

Return type:

xr.Dataset