gemmr.generative_model._mk_Sigmaxy¶

gemmr.generative_model._mk_Sigmaxy(assemble_Sigmaxy, Sigmaxx, Sigmayy, U, V, m, max_n_sigma_trials, qx, qy, rng, true_corrs, expl_var_ratio_thr=0.5, verbose=True)¶

Generate the between-set covariance matrix \(\Sigma_{XY}\) (i.e. the upper right block of the joint covariance matrix).

Random directions are chosen for the X and Y latent mode vectors with the constraints that

the within-modality variance along these directions is at least expl_var_ratio_thr x the average variance along any dimension in this modality
the resulting joint cross-modality covariance matrix must be positive definite

To increase chances of large within-modality variance for randomly chosen latent mode vectors they are calculated as a random linear combination of a random vector from the first q_x (for modality X, q_y for modality Y) modes and a random vector from the remaining modes.

If this is not successful, i.e. if no between-set weight vectors could be found that explain enough variance and result in a positive definite \(\Sigma_{XY}\), an optimization procedure (using differential evolution algorithm) is used to maximize the minimum eigenvalue of \(\Sigma_{XY}\). If that doesn’t succeed either, a ValueError is raised.

Parameters:

Sigmaxx (np.ndarray (n_X_features, n_X_features)) – covariance-matrix for modality X, i.e. the upper left block of the joint covariance matrix
Sigmayy (np.ndarray (n_Y_features, n_Y_features)) – covariance-matrix for modality Y, i.e. lower right block of the joint covariance matrix
U (np.ndarray (n_X_features, n_X_features)) – columns of U contain basis vectors for X data
V (np.ndarray n_Y_features, n_Y_features)) – columns of V contain basis vectors for Y data
m (int >= 1) – number of cross-modality modes to be encoded
max_n_sigma_trials (int) – number of times an attempt is made to find latent mode vectors satisfying constraints
qx (int) – latent mode vectors for modality X are calculated as a random linear combination of - a random linear combination of the first q_x columns of U - a random linear combination of the remaining columns of U
qy (int) – latent mode vectors for modality Y are calculated as a random linear combination of - a random linear combination of the first q_y columns of V - a random linear combination of the remaining columns of V
rng (random number generator instance) –
true_corrs (np.ndarray (m,)) – cross-modality correlations that each latent mode should have
expl_var_ratio_thr (float) – threshold for required within-modality variance along latent mode vectors
verbose (bool) – whether to print status messages

Returns:

Sigmaxy (np.ndarray (n_X_features, n_Y_features)) – cross-modality covariance matrix
Sigmaxy_svals (np.ndarray (m,)) – singular values of Sigmaxy, these are the true canonical correlations or covariances (for CCA or PLS, respectively)
true_corrs (np.ndarray (m,)) – the cross-modality covariances are calculated as the true correlations (given by input argument true_corrs times the variances along these directions. Should the resulting cross-modality covariances not be in descending order, they will be reordered, as will input argument true_corrs to reflect the change in order
latent_expl_var_ratios_x (np.ndarray (m,)) – explained variance ratio in X modality along the latent directions
latent_expl_var_ratios_y (np.ndarray (m,)) – explained variance ratio in Y modality along the latent directions
U_ (np.ndarray (n_X_features, m)) – latent mode vectors for X
V_ (np.ndarray (n_Y_features, m)) – latent mode vectors for Y
cosine_sim_pc1_latentMode_x ((m,)) – cosine similarities between latent mode vectors and PC1 for X
cosine_sim_pc1_latentMode_y ((m,)) – cosine similarities between latent mode vectors and PC1 for Y
latent_mode_vector_algo (str) – ‘qr__’ or ‘opti’, algorithm with which the latent mode vectors were found

Raises:

ValueError – if no between-set weight vectors could be found that explain enough variance and result in a positive definite \(\Sigma_{XY}\)