gemmr.model_selection.max_min_detector

gemmr.model_selection.max_min_detector(X, Y, p_max, alpha=0.01)

Hypothesis-test based method to jointly determine number of PCA and between-set components.

Parameters:
  • X ((n_samples, n_X_features)) – data matrix for X
  • Y ((n_samples, n_Y_features)) – data matrix for Y
  • p_max (int < min(n_X_features, n_Y_features)) – maximum number of components to try for both X and Y
Returns:

  • pXs (list) – best number of PCA components for X, if there are multiple best options list contains all of them
  • pYs (list) – best number of PCA components for Y, if there are multiple best options list contains all of them (i.e. i-th element of pXs and pYs belong together)
  • d (int) – best number of between-set components
  • best_s (np.ndarray (p_max, p_max)) – inferred number of between-set modes for given number of within-set principal components (along axis of matrix)

References

Song Y et al., Canonical correlation analysis of high-dimensional data with very small sample support, Signal Processing (2016)