gemmr.sample_size.linear_model.prep_data_for_lm

gemmr.sample_size.linear_model.prep_data_for_lm(ds, n_reqs, include_latent_explained_vars, include_pc_var_decay_constants, include_pdiff, prefix='')

Prepare outcome data for use with linear model.

Constructs a predictor data matrix with columns representing linear model predictors, and rows representing stacked synthetic datasets (stacked dimensions are ‘px’, ‘r’, ‘Sigma_id’).

Parameters:
  • ds (xr.Dataset) – outcome dataset

  • n_reqs (xr.DataArray) – required sample sizes

  • include_pc_var_decay_constants (bool) – whether to include a predictor for the principal component spectrum decay constant in the linear model

  • include_latent_explained_vars (bool) – whether to include a predictor for the latent explained variance in the linear model

  • include_pdiff (bool) – whether to include predictor for \(|p_X - p_Y|\) in the linear model

  • prefix (str) – prefix for outcome variables in ds

Returns:

  • X ((n_synth_datasets, n_predictors)) – predictor data matrix

  • y ((n_synth_datasets,)) – dependent variable

  • coef_names (list) – labels for included linear model coefficients (first one is “const”)