Largest eigenvalues in multivariate statistical analysis


Multivariate statistical analysis aims to discover and test for the presence of structure from sample data in which the unit is a possibly high dimensional vector with correlated components. In the classical techniques, the eigenvalues of (Wishart-distributed) covariance matrices play a central role: we introduce and illustrate with some examples from genetics and finance. The distribution theory for the eigenvalues is complicated, but in recent years a new impetus to simpler approximate results has come from random matrix theory, by imagining the number of variables as large. We focus on the largest eigenvalue in particular, and review null hypothesis distribution approximations using the celebrated Tracy-Widom laws, aiming to show that they are accurate enough for routine applied use even in quite low dimensions. Brief mention of behavior under some non-null alternatives will be made, along with a few further remarks about applications.