gemmr.sample_analysis.analyzers.analyze_model_parameters¶
-
gemmr.sample_analysis.analyzers.
analyze_model_parameters
(model, estr=None, n_rep=100, n_bs=0, n_perm=0, n_per_ftrs=(2, 10, 50), pxs=(4, 8, 16, 32, 64), pys='px', rs=(0.1, 0.3, 0.5, 0.7, 0.9), n_between_modes=1, n_Sigmas=1, axPlusay_range=(-2, 0), rotate_XY=False, qx=0.9, qy=0.9, expl_var_ratio_thr=0.5, max_n_sigma_trials=10000, addons=(), n_test=0, mk_test_statistics=None, postprocessors=(), verbose=False, random_state=0, show_progress=True, **kwargs)¶ Parameter-dependent models are set up and resulting synthetic datasets are analyzed.
For each model, differing by the number of features, ground-truth correlations, within-set principal component spectra and direction of weight vectors relative to the principal component axes, synthetic datasets are generated and analyzed.
Parameters: - model ('cca' or 'pls') – whether synthetic data is generated and analyzed for CCA or PLS
- estr (
None
or sklearn-style estimator) – ifNone
eitherestimators.SVDCCA
orestimators.SVDPLS
is used depending on the value ofmodel
. Otherwise, the given estimator should correspond tomodel
and must have a methodfit
and (after fitting) attributesassocs_
,x_rotations_
,y_rotations_
,x_scores_
,y_scores_
- n_rep (int) – for each investigated model (i.e. for each joint covariance matrix)
specified by the number of features (argument
pxs
), ground-truth correlations (argumentrs
) and the principal component spectra (argumentn_Sigmas
)n_rep
datasets are drawn from this particular model and analyzed - n_bs (int) – number of bootstrap iterations to perform on each synthetic dataset
- n_perm (int) – number of permutations to perform on each synthetic dataset
- n_per_ftrs ('auto' or list-like of int) – multiplied by
px+py
specifies the size of samples generated from the model. If ‘auto’ values are chosen heuristically. - pxs (list-like of int) – number of X-features to use
- pys ('px', function or int) – if ‘px’ uses
px
Y features, if function usesfunction(px)
Y features, if int usesint(pys)
Y features - rs (list-like of float between 0 and 1) – assumed ground-truth correlations
- n_between_modes (int) – number of between-set association modes
- n_Sigmas (int) – number of covariance matrices generated for each px and r. Given
px and r covariance matrices differ by their within-set principal
component spectra (specified by parameter
axPlusay_range
and the directions of the between-set mode (i.e. CCA / PLS weight) vectors relative to the principal component axes - axPlusay_range (tuple of floats <= 0) – separately for X and Y the within-set principal component spectrum
is assumed to follow a power-law with exponent drawn from a uniform
distribution with bounds given by
axPlusay_range
- rotate_XY (bool) – if
True
a random rotation is applied to each generated dataset, the same rotation is applied to datasets drawn from the same model (i.e. with the same joint covariance matrix) - qx (int or float between 0 and 1) – specifies the number of dominant basis vectors from which to choose the
dominant component of the latent mode vectors for X. See
generative_model._mk_Sigmaxy()
for details - qy (int or float between 0 and 1) – specifies the number of dominant basis vectors from which to choose the
dominant component of the latent mode vectors for Y. See
generative_model._mk_Sigmaxy()
for details - expl_var_ratio_thr (float) – threshold for required within-modality variance along latent mode vectors
- max_n_sigma_trials (int) – number of times an attempt is made to find suitable latent mode vectors. See _mk_Sigmaxy for details.
- addons (list-like of functions) –
After fitting the estimator and saving association strengths, weights and loadings in
results
additional analyses can be performed with these functions. They are called in the given order, and must have the signatureaddana_fun(estr, X, Y, Xorig, Yorig, x_align_ref, y_align_ref, results, **kwargs)
and are expected to save their respective outcome features
results
. Various such functions are provided in modulesample_analysis_addons
- n_test ('auto' or int >= 0) – to analyze some consistency properties across repeated draws from the
same model, a test set of size
n_test
is generated for each joint covariance matrix and provided to down-stream analyses via keyword-argumentsXtest
andYtest
. If set to'auto'
a test set of size max(n_per_ftrs) * (px + py) is used. - mk_test_statistics (
None
or function) –if not
None
the function must have the signaturefun(Xtest, Ytest, x_weights_true, y_weights_true)
where
Xtest
andYtest
are, respectively,np.ndarray
of dimension(n_test, n_x_features)
and(n_test, n_y_features)
, andx_weights_true
andy_weights_true
arenp.ndarray
of dimension(n_x_features, n_components)
and(n_y_features, n_components)
containing the true weight vectors - postprocessors (list-like of functions) – functions are called after the final dataset has been concatenated and take that xr.Dataset as only argument
- verbose (bool) – whether some status messages are printed
- random_state (
None
, int or random number generator instance) – used to generate random numbers additional_analyses - show_progress (bool) – if
True
progress bars are shown, ifFalse
not - kwargs (dict) – forwarded to additional analysis functions
Returns: results – containing data variables for outcome features generated by analyses
Return type: xr.Dataset