gemmr.generative_model.setup_model¶
-
gemmr.generative_model.
setup_model
(model, random_state=42, px=5, py=5, qx=0.9, qy=0.9, m=1, c1x=1, c1y=1, ax=-1, ay=-1, r_between=0.3, a_between=-1, max_n_sigma_trials=10000, expl_var_ratio_thr=0.5, cx=None, cy=None, verbose=False, return_full=False)¶ Generate a joint covariance matrix for X and Y.
It is assumed that both datasets live in their respective principal component coordinate system, i.e. that the within-set covariance matrices \(\Sigma_{XX}\) and \(\Sigma_{YY}\) are diagonal. The entries of the diagonal are set to follow power laws with decay constants ax and ay for X and Y, respectively, and scaled by cx and cy.
For generation of the between-set covariance matrix \(\Sigma_{XY}\)
_mk_Sigmaxy()
is called, see there for details.Parameters: - model ("pls" or "cca") – whether to return a covariance matrix for CCA or PLS
- random_state (None or int or a random number generator instance) – For reproducibility, a random number generator is instantiated and all random numbers are drawn from that
- px (int) – number of features in X
- py (int) – number of features in Y
- qx (int or float between 0 and 1) – if float, gives the fraction of px to use
(i.e. q_x <- int(q_x * p_x)). Specifies the number of dominant basis
vectors from which to choose one component of the latent mode vectors
for X. See
_mk_Sigmaxy()
for details - qy (int or float between 0 and 1) – if float, gives the fraction of py to use
(i.e. q_y <- int(q_y * p_y)). Specifies the number of dominant basis
vectors from which to choose one component of the latent mode vectors
for Y. See
_mk_Sigmaxy()
for details - m (int) – number of latent cross-modality modes to encode
- c1x (float) – Should usually be 1. All X variances will be scaled by this number
- c1y (float) – Should usually be 1. All Y variances will be scaled by this number
- ax (float) – should usually be <= 0. Eigenvalues of within-modality covariance for X are assumed to follow a power-law with this exponent
- ay (float) – should usually be <= 0. Eigenvalues of within-modality covariance for X are assumed to follow a power-law with this exponent
- r_between (float between 0 and 1) – cross-modality correlation the latent mode vectors should have
- a_between (float) – should usually be <= 0. Higher-order cross-modality correlations are scaled by a power-law with this exponent
- max_n_sigma_trials (int >= 1) – number of times an attempt is made to find suitable latent mode
vectors. See
_mk_Sigmaxy()
for details. - expl_var_ratio_thr (float) – threshold for required within-modality variance along latent mode vectors
- cx (np.ndarray) – within-set variances for X
- cy (np.ndarray) – within-set variances for Y
- verbose (bool) – whether to print status messages
- return_full (bool) – if
False
returns only the joint covariance matrix, otherwise return more quantities of interest
Returns: - Sigma (np.ndarray) – joint covariance matrix for X and Y
- Sigmaxy_svals (np.ndarray (m,)) – singular values of
Sigmaxy
, these are the true canonical correlations or covariances (ifmodel
is ‘cca’ or ‘pls’, respectively) - true_corrs (np.ndarray (m,)) – the encoded cross-modality correlations for each mode
- U (np.ndarray (n_X_features, n_X_features)) – basis vectors for X
- V (np.ndarray (n_Y_features, n_Y_features)) – basis vectors for Y
- latent_expl_var_ratios_x (np.ndarray (m,)) – explained variance ratio in X modality along the latent directions
- latent_expl_var_ratios_y (np.ndarray (m,)) – explained variance ratio in Y modality along the latent directions
- U_latent (np.ndarray (n_X_features, m)) – latent mode vectors for X
- V_latent (np.ndarray (n_Y_features, m)) – latent mode vectors for Y
- cosine_sim_pc1_latentMode_x ((m,)) – cosine similarities between latent mode vectors and PC1 for X
- cosine_sim_pc1_latentMode_y ((m,)) – cosine similarities between latent mode vectors and PC1 for Y
- latent_mode_vector_algo (str) – ‘qr__’ or ‘opti’, algorithm with which the latent mode vectors were found
Raises: ValueError
– * if the number of requested between-set association modes m is greater than the minimum of the dimensions of the dominant subspaces (as encoded by qx and qy) * if the resulting joint covariance matrix is not positive definiteNotImplementedError
– * if model == ‘cca’ and m > 1