gemmr.sample_analysis.analyzers.analyze_model_parameters¶

gemmr.sample_analysis.analyzers.analyze_model_parameters(model, estr=None, n_rep=100, n_bs=0, n_perm=0, n_per_ftrs=(2, 10, 50), pxs=(4, 8, 16, 32, 64), pys='px', rs=(0.1, 0.3, 0.5, 0.7, 0.9), n_between_modes=1, n_Sigmas=1, axPlusay_range=(-2, 0), rotate_XY=False, qx=0.9, qy=0.9, expl_var_ratio_thr=0.5, max_n_sigma_trials=10000, addons=(), n_test=0, mk_test_statistics=None, postprocessors=(), verbose=False, random_state=0, show_progress=True, **kwargs)¶

Parameter-dependent models are set up and resulting synthetic datasets are analyzed.

For each model, differing by the number of features, ground-truth correlations, within-set principal component spectra and direction of weight vectors relative to the principal component axes, synthetic datasets are generated and analyzed.

Parameters:	model ('cca' or 'pls') – whether synthetic data is generated and analyzed for CCA or PLS estr (`None` or sklearn-style estimator) – if `None` either `estimators.SVDCCA` or `estimators.SVDPLS` is used depending on the value of `model`. Otherwise, the given estimator should correspond to `model` and must have a method `fit` and (after fitting) attributes `assocs_`, `x_rotations_`, `y_rotations_`, `x_scores_`, `y_scores_` n_rep (int) – for each investigated model (i.e. for each joint covariance matrix) specified by the number of features (argument `pxs`), ground-truth correlations (argument `rs`) and the principal component spectra (argument `n_Sigmas`) `n_rep` datasets are drawn from this particular model and analyzed n_bs (int) – number of bootstrap iterations to perform on each synthetic dataset n_perm (int) – number of permutations to perform on each synthetic dataset n_per_ftrs ('auto' or list-like of int) – multiplied by `px+py` specifies the size of samples generated from the model. If ‘auto’ values are chosen heuristically. pxs (list-like of int) – number of X-features to use pys ('px', function or int) – if ‘px’ uses `px` Y features, if function uses `function(px)` Y features, if int uses `int(pys)` Y features rs (list-like of float between 0 and 1) – assumed ground-truth correlations n_between_modes (int) – number of between-set association modes n_Sigmas (int) – number of covariance matrices generated for each px and r. Given px and r covariance matrices differ by their within-set principal component spectra (specified by parameter `axPlusay_range` and the directions of the between-set mode (i.e. CCA / PLS weight) vectors relative to the principal component axes axPlusay_range (tuple of floats <= 0) – separately for X and Y the within-set principal component spectrum is assumed to follow a power-law with exponent drawn from a uniform distribution with bounds given by `axPlusay_range` rotate_XY (bool) – if `True` a random rotation is applied to each generated dataset, the same rotation is applied to datasets drawn from the same model (i.e. with the same joint covariance matrix) qx (int or float between 0 and 1) – specifies the number of dominant basis vectors from which to choose the dominant component of the latent mode vectors for X. See `generative_model._mk_Sigmaxy()` for details qy (int or float between 0 and 1) – specifies the number of dominant basis vectors from which to choose the dominant component of the latent mode vectors for Y. See `generative_model._mk_Sigmaxy()` for details expl_var_ratio_thr (float) – threshold for required within-modality variance along latent mode vectors max_n_sigma_trials (int) – number of times an attempt is made to find suitable latent mode vectors. See _mk_Sigmaxy for details. addons (list-like of functions) – After fitting the estimator and saving association strengths, weights and loadings in `results` additional analyses can be performed with these functions. They are called in the given order, and must have the signature addana_fun(estr, X, Y, Xorig, Yorig, x_align_ref, y_align_ref, results, kwargs) and are expected to save their respective outcome features `results`. Various such functions are provided in module `sample_analysis_addons` n_test** ('auto' or int >= 0) – to analyze some consistency properties across repeated draws from the same model, a test set of size `n_test` is generated for each joint covariance matrix and provided to down-stream analyses via keyword-arguments `Xtest` and `Ytest`. If set to `'auto'` a test set of size max(n_per_ftrs) * (px + py) is used. mk_test_statistics (`None` or function) – if not `None` the function must have the signature fun(Xtest, Ytest, x_weights_true, y_weights_true) where `Xtest` and `Ytest` are, respectively, `np.ndarray` of dimension `(n_test, n_x_features)` and `(n_test, n_y_features)`, and `x_weights_true` and `y_weights_true` are `np.ndarray` of dimension `(n_x_features, n_components)` and `(n_y_features, n_components)` containing the true weight vectors postprocessors (list-like of functions) – functions are called after the final dataset has been concatenated and take that xr.Dataset as only argument verbose (bool) – whether some status messages are printed random_state (`None`, int or random number generator instance) – used to generate random numbers additional_analyses show_progress (bool) – if `True` progress bars are shown, if `False` not kwargs (dict) – forwarded to additional analysis functions
Returns:	results – containing data variables for outcome features generated by analyses
Return type:	xr.Dataset