gemmr.generative_model.setup_model

gemmr.generative_model.setup_model(model, random_state=42, px=5, py=5, qx=0.9, qy=0.9, m=1, c1x=1, c1y=1, ax=-1, ay=-1, r_between=0.3, a_between=-1, max_n_sigma_trials=10000, expl_var_ratio_thr=0.5, cx=None, cy=None, verbose=False, return_full=False)

Generate a joint covariance matrix for X and Y.

It is assumed that both datasets live in their respective principal component coordinate system, i.e. that the within-set covariance matrices \(\Sigma_{XX}\) and \(\Sigma_{YY}\) are diagonal. The entries of the diagonal are set to follow power laws with decay constants ax and ay for X and Y, respectively, and scaled by cx and cy.

For generation of the between-set covariance matrix \(\Sigma_{XY}\) _mk_Sigmaxy() is called, see there for details.

Parameters:
  • model ("pls" or "cca") – whether to return a covariance matrix for CCA or PLS
  • random_state (None or int or a random number generator instance) – For reproducibility, a random number generator is instantiated and all random numbers are drawn from that
  • px (int) – number of features in X
  • py (int) – number of features in Y
  • qx (int or float between 0 and 1) – if float, gives the fraction of px to use (i.e. q_x <- int(q_x * p_x)). Specifies the number of dominant basis vectors from which to choose one component of the latent mode vectors for X. See _mk_Sigmaxy() for details
  • qy (int or float between 0 and 1) – if float, gives the fraction of py to use (i.e. q_y <- int(q_y * p_y)). Specifies the number of dominant basis vectors from which to choose one component of the latent mode vectors for Y. See _mk_Sigmaxy() for details
  • m (int) – number of latent cross-modality modes to encode
  • c1x (float) – Should usually be 1. All X variances will be scaled by this number
  • c1y (float) – Should usually be 1. All Y variances will be scaled by this number
  • ax (float) – should usually be <= 0. Eigenvalues of within-modality covariance for X are assumed to follow a power-law with this exponent
  • ay (float) – should usually be <= 0. Eigenvalues of within-modality covariance for X are assumed to follow a power-law with this exponent
  • r_between (float between 0 and 1) – cross-modality correlation the latent mode vectors should have
  • a_between (float) – should usually be <= 0. Higher-order cross-modality correlations are scaled by a power-law with this exponent
  • max_n_sigma_trials (int >= 1) – number of times an attempt is made to find suitable latent mode vectors. See _mk_Sigmaxy() for details.
  • expl_var_ratio_thr (float) – threshold for required within-modality variance along latent mode vectors
  • cx (np.ndarray) – within-set variances for X
  • cy (np.ndarray) – within-set variances for Y
  • verbose (bool) – whether to print status messages
  • return_full (bool) – if False returns only the joint covariance matrix, otherwise return more quantities of interest
Returns:

  • Sigma (np.ndarray) – joint covariance matrix for X and Y
  • Sigmaxy_svals (np.ndarray (m,)) – singular values of Sigmaxy, these are the true canonical correlations or covariances (if model is ‘cca’ or ‘pls’, respectively)
  • true_corrs (np.ndarray (m,)) – the encoded cross-modality correlations for each mode
  • U (np.ndarray (n_X_features, n_X_features)) – basis vectors for X
  • V (np.ndarray (n_Y_features, n_Y_features)) – basis vectors for Y
  • latent_expl_var_ratios_x (np.ndarray (m,)) – explained variance ratio in X modality along the latent directions
  • latent_expl_var_ratios_y (np.ndarray (m,)) – explained variance ratio in Y modality along the latent directions
  • U_latent (np.ndarray (n_X_features, m)) – latent mode vectors for X
  • V_latent (np.ndarray (n_Y_features, m)) – latent mode vectors for Y
  • cosine_sim_pc1_latentMode_x ((m,)) – cosine similarities between latent mode vectors and PC1 for X
  • cosine_sim_pc1_latentMode_y ((m,)) – cosine similarities between latent mode vectors and PC1 for Y
  • latent_mode_vector_algo (str) – ‘qr__’ or ‘opti’, algorithm with which the latent mode vectors were found

Raises:
  • ValueError – * if the number of requested between-set association modes m is greater than the minimum of the dimensions of the dominant subspaces (as encoded by qx and qy) * if the resulting joint covariance matrix is not positive definite
  • NotImplementedError – * if model == ‘cca’ and m > 1