gemmr.model_selection.max_min_detector

gemmr.model_selection.max_min_detector(X, Y, p_max, alpha=0.01)

Hypothesis-test based method to jointly determine number of PCA and between-set components.

Parameters:
  • X ((n_samples, n_X_features)) – data matrix for X

  • Y ((n_samples, n_Y_features)) – data matrix for Y

  • p_max (int < min(n_X_features, n_Y_features)) – maximum number of components to try for both X and Y

Returns:

  • pXs (list) – best number of PCA components for X, if there are multiple best options list contains all of them

  • pYs (list) – best number of PCA components for Y, if there are multiple best options list contains all of them (i.e. i-th element of pXs and pYs belong together)

  • d (int) – best number of between-set components

  • best_s (np.ndarray (p_max, p_max)) – inferred number of between-set modes for given number of within-set principal components (along axis of matrix)

References

Song Y et al., Canonical correlation analysis of high-dimensional data with very small sample support, Signal Processing (2016)