gemmr.sample_size.linear_model.cca_req_corr

gemmr.sample_size.linear_model.cca_req_corr(X, Y, ax, ay, n_req, criterion='combined', algorithm='linear_model', target_power=0.9, target_error=0.1, expl_var_ratio=0.3, data_home=None)

Determines the minimum required true correlation to achieve power and error levels.

Parameters:
  • X (np.ndarray (n_samples, n_X_features) or int >= 2) – either a data matrix or directly the number of features for data matrix \(X\)

  • Y (np.ndarray (n_samples, n_Y_features) or int >= 2) – either a data matrix or directly the number of features for data matrix \(Y\)

  • ax (float < 0 or None) – principal component spectrum decay constant, if X is not a data matrix, None otherwise

  • ay (float < 0 or None) – principal component spectrum decay constant, if Y is not a data matrix, None otherwise

  • n_req (sample_size) – available sample size

  • criterion (str) –

    criterion according to which sample sizes are estimated. Can be:

    • 'combined'

    • 'power'

    • 'association_strength'

    • 'weight'

    • 'score'

    • 'loading'

    • 'crossloading'

  • algorithm (str) –

    algorithm used to calculate sample sizes. Can be:

    • 'linear_model'

  • target_power (float between o and 1) – if criterion is 'combined' or 'power' sample size is chosen to obtain at least target_power power

  • target_error (float between 0 and 1) – if criterion is not 'power' sample size is chosen to obtain at most target_error error in error metric(s)

  • expl_var_ratio (float) – if X or Y is a data matrix, ax or ay, respectively, will be estimated directly from the data using the number of principal components that explain this amount of variance

  • data_home (None or str) – path where outcome data are stored, None indicates default path

Returns:

req_corr – minimum required true correlation

Return type:

float