gemmr.sample_analysis.analyzers.analyze_model_parameters

gemmr.sample_analysis.analyzers.analyze_model_parameters(model, estr=None, n_rep=100, n_bs=0, n_perm=0, n_per_ftrs=(2, 10, 50), pxs=(4, 8, 16, 32, 64), pys='px', rs=(0.1, 0.3, 0.5, 0.7), n_Sigmas=1, powerlaw_decay=(-1, -1), coordinate_system='pc', expl_var_ratio_thr=0.5, max_n_sigma_trials=100000, addons=(), resample_addons=None, n_test=0, mk_test_statistics=None, postprocessors=(), comparison_gms=(), random_state=0, show_progress=True, check_convergence=False, conv_thr=0.99, fit_params=None, **kwargs)

Parameter-dependent models are set up and resulting synthetic datasets are analyzed.

For each model, differing by the number of features, ground-truth correlations, within-set principal component spectra and direction of weight vectors relative to the principal component axes, synthetic datasets are generated and analyzed.

Parameters:
  • model ('cca' or 'pls') – whether synthetic data is generated and analyzed for CCA or PLS

  • estr (None or sklearn-style estimator) – if None either estimators.SVDCCA or estimators.SVDPLS is used depending on the value of model. Otherwise, the given estimator should correspond to model and must have a method fit and (after fitting) attributes assocs_, x_rotations_, y_rotations_, x_scores_, y_scores_

  • n_rep (int) – for each investigated model (i.e. for each joint covariance matrix) specified by the number of features (argument pxs), ground-truth correlations (argument rs) and the principal component spectra (argument n_Sigmas) n_rep datasets are drawn from this particular model and analyzed

  • n_bs (int) – number of bootstrap iterations to perform on each synthetic dataset

  • n_perm (int) – number of permutations to perform on each synthetic dataset

  • n_per_ftrs ('auto' or list-like of int) – multiplied by px+py specifies the size of samples generated from the model. If ‘auto’ values are chosen heuristically.

  • pxs (list-like of int) – number of X-features to use

  • pys ('px', function or int) – if ‘px’ uses px Y features, if function uses function(px) Y features, if int uses int(pys) Y features

  • rs (list-like of float between 0 and 1) – assumed ground-truth correlations

  • n_Sigmas (int) – number of covariance matrices generated for each px and r. Given px and r covariance matrices differ by their within-set principal component spectra (specified by parameter powerlaw_decay and the directions of the between-set mode (i.e. CCA / PLS weight) vectors relative to the principal component axes

  • powerlaw_decay (tuple of floats <= 0) – separately for X and Y the within-set principal component spectrum is assumed to follow a power-law. powerlaw_decay can either be a tuple of 2 floats <= 0, in which case the 2 numbers represent the decay constants for X and Y, respectively. Alternatively, powerlaw_decay can be a tuple comprising the string random_sum and 2 floats <= 0, in which case the value for the sum of the decay constants for X and Y is drawn from a uniform distribution with boundaries given by the 2 floats; the decay constant for X is then a random fraction (uniform between 0 and 1) of the sum, and the decay constant for Y is such that the 2 decay constants sum up to the value for the sum

  • coordinate_system (bool) – if True a random rotation is applied to each generated dataset, the same rotation is applied to datasets drawn from the same model (i.e. with the same joint covariance matrix)

  • expl_var_ratio_thr (float) – threshold for required within-modality variance along latent mode vectors

  • max_n_sigma_trials (int) – number of times an attempt is made to find suitable latent mode vectors. See _mk_Sigmaxy for details.

  • addons (list-like of functions) –

    After fitting the estimator and saving association strengths, weights and loadings in results additional analyses can be performed with these functions. They are called in the given order, and must have the signature

    addana_fun(estr, X, Y, Xorig, Yorig, x_align_ref, y_align_ref,
        results, **kwargs)
    

    and are expected to save their respective outcome features results. Various such functions are provided in module sample_analysis_addons

  • resample_addons (None or list-like of add-on functions) – if None then addons is used

  • n_test ('auto' or int >= 0) – to analyze some consistency properties across repeated draws from the same model, a test set of size n_test is generated for each joint covariance matrix and provided to down-stream analyses via keyword-arguments Xtest and Ytest. If set to 'auto' a test set of size max(n_per_ftrs) * (px + py) is used.

  • mk_test_statistics (None or function) –

    if not None the function must have the signature

    fun(Xtest, Ytest, x_weights_true, y_weights_true)
    

    where Xtest and Ytest are, respectively, np.ndarray of dimension (n_test, n_x_features) and (n_test, n_y_features), and x_weights_true and y_weights_true are np.ndarray of dimension (n_x_features, n_components) and (n_y_features, n_components) containing the true weight vectors

  • postprocessors (list-like of functions) – functions are called after the final dataset has been concatenated and take that xr.Dataset as only argument

  • comparison_gms (list-like of tuples (label, functions)) – labels identify the alternative generative models, functions return an object that encodes the alternative generative models, based on a given one. Functions take a GEMMR instance as only positional argument and m (number of between-set modes) and random_state (a random number generator instance) as keyword arguments, and return an instance of an object that has essentially the same attributes as gm (cf source of analyze_model() to see which are used). The dimensionalities of the the X and Y latent spaces must be identical to those of the GEMMR instance. Every generated dataset will then additionally be analyzed here with the estimator estr and with respect to the ground truth latent axes x_weights_ and y_weights_ encoded in these returned objects.

  • random_state (None, int or random number generator instance) – used to generate random numbers additional_analyses

  • show_progress (bool) – if True progress bars are shown, if False not

  • fit_params (dict) – keyword-arguments for estr.fit

  • kwargs (dict) – forwarded to additional analysis functions

Returns:

results – containing data variables for outcome features generated by analyses

Return type:

xr.Dataset