gemmr.sample_analysis.macros.analyze_subsampled_and_resampled¶
-
gemmr.sample_analysis.macros.
analyze_subsampled_and_resampled
(estr, X, Y, permutations=1000, n_min_subsample=None, frac_max_subsample=0.8, n_subsample_ns=5, n_rep_subsample=100, n_perm_subsample=1000, n_test_subsample=0, n_jobs=1, random_state=0)¶ Analyzes the given data with the given estimator.
Specifially:
- calculates the permutation-based p-value
- analyzes the whole-sample, and its permutations
- analyzes subsamples of the data
Parameters: - estr (sklearn-style estimator) – estimator used to analyze the data, needs to be compatible with analyzers in ccapwr.sample_analysis.analyzers
- X (np.ndarray (n_samples, n_X_features)) – dataset X
- Y (np.ndarray (n_samples, n_Y_features)) – dataset Y
- permutations (int or iterable) – used for calculating p-value and the whole-sample analysis. If int, gives the number of permutations used, if iterable each element gives one set of permutation indices
- n_min_subsample (None or int) – minimum number of samples to which the data are subsampled. If None
X.shape[1]+Y.shape[1]+2
is used - frac_max_subsample (float between 0 and 1) – the maximum number of samples to which the data are subsampled is
frac_max_subsample * len(X)
- n_subsample_ns (int) – the list of sample sizes to which the data are subsampled is a
np.logspace
with this many entries - n_rep_subsample (int) – number of times a subsampled dataset of a given size is generated
- n_perm_subsample (int) – number of permutations for each subsampled datasets
- n_test_subsample (int) – number of subjects to use as test set in subsampled datasets
- n_jobs (int or None) – number of parallel jobs (see
joblib.Parallel
) - random_state (None, int or rng-instance) – random seed
Returns: results – with items:
- p_value : float (permutation-based p-value)
- whole_sample : xr.Dataset (output of analyze_resampled)
- subsampled : xr.Dataset (output of analyze_subsampled)
Return type: dict