gemmr.generative_model._find_latent_mode_vectors_pc1

gemmr.generative_model._find_latent_mode_vectors_pc1(Sigmaxx, Sigmayy, U, V, assemble_Sigmaxy, expl_var_ratio_thr, m, max_n_sigma_trials, qx, qy, rng, true_corrs, verbose)

Selects the first principal component axes as weight vectors.

If only \(m=1\) mode is sought, and if the variances in \(X\) and \(Y\) are standardized to be 1 along PC1 then, for PLS, the covariances along the weight vectors are identical to the correlation. Thus, for both PLS and CCA the between-set covariance matrix is given by : math:Sigma_{XY}=r_mathrm{true}

\[\Sigma_{XY} = r_\mathrm{true} \vec{u}_1 \vec{v}_1^T\]

where \(\vec{u}_1\) and \(\vec{v}_1\) are the first principal component axes for \(X\) and \(Y\), respectively. If the overall coordinate system is the principal component coordinate system \(\Sigma_{XY}\) is a \(p_x \times p_y\) matrix with \(r_\mathrm{true}\) in the top left corner and 0 everywhere else.

The block matrix \(\Sigma\) is positive definite if and only if its Schur complement \(\Sigma_{XX} - \Sigma_{XY} \Sigma_{YY} \Sigma_{XY}^T\) is positive definite. \(\Sigma_{XY} \Sigma_{YY} \Sigma_{XY}^T\) simplifies to \(r_\mathrm{true}^2 \vec{u}_1 \vec{u}_1^T\). As (in the principal component coordinate system) \(\Sigma_{XX}\) is diagonal and, by assumption the top-left element is 1, and \(r_\mathrm{true}^2 < 1\), all entries on the diagonal of \(\Sigma_{XX}\) are greater than 0. Thus, \(\Sigma\) is positive definite when the weight vectors are chosen as the first principal component axes.

Parameters:
  • Sigmaxx (np.ndarray (px, px)) – within-set covariance matrix for X
  • Sigmayy (np.ndarray (py, py)) – within-set covariance matrix for Y
  • U (np.ndarray (px, m)) – weight vectors for X
  • V (np.ndarray (py, m)) – weight vectors for Y
  • assemble_Sigmaxy (function) – either _assemble_Sigmaxy_pls or _assemble_Sigmaxy_cca
  • expl_var_ratio_thr (float) – the ratio of the amount of variance along the first mode vectors in X and Y to the mean variance along a mode in X and Y needs to surpass this number.
  • m (int >= 1) – number of cross-modality modes to be encoded
  • max_n_sigma_trials (int) – maximum number of attempts made to find a linear combination of dominant and low-variance subspace components for the weight vectors such that both enough variance is explained and the resulting joint covariance matrix \(\Sigma\) is positive definite
  • qx (int) – dimensionality of dominant subspace for X
  • qy (int) – dimensionality of dominant subspace for Y
  • rng (random number generator instance) – for reproducibility, all random numbers will be drawn from this generator
  • true_corrs (np.ndarray (m,)) – true correlation of between-set association modes
  • verbose (bool) – whether to print status messages
Returns:

  • Sigmaxy (np.ndarray (px, py)) – between-set covariance matrix
  • Sigmaxy_svals (np.ndarray (m,)) – singular values of Sigmaxy, these are the true canonical correlations or covariances (for CCA or PLS, respectively)
  • U_ (np.ndarray (px, m)) – between-set weight vectors
  • V_ (np.ndarray (py, m)) – between-set weight vectors
  • latent_expl_var_ratios_x (np.ndarray (m,)) – explained variance ratios for between-set weight vectors in set X
  • latent_expl_var_ratios_y (np.ndarray (m,)) – explained variance ratios for between-set weight vectors in set Y
  • min_eval (float) – smallest eigenvalue of Schur complement of joint covariance matrix \(\Sigma\). \(\Sigma\) is positive definite if and only if min_eval > 0
  • true_corrs (np.ndarray (m,)) – true correlations of between-set association modes
  • latent_mode_vector_algo (str) – identifies the algorithm: is set to 'pc1_'