Department of Computer Science

Data Science Ensemble: Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA) with Robert Frost

November 4, 2021
4:00 PM to 5:00 PM
Online (https://maine.zoom.us/my/usm.datascience)
Free

Robert Frost, PhD, will give the talk in The USM Data Science Ensemble, a seminar series focused on the intersection of data science and real world applications. We invite you to join us for this in-depth look at a practical application of data science in the real world by registering for the talk, via this form

In Dr. Robert Frost's talk, "Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA)," he will present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the recently rediscovered formula for computing normed, squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the EESPCA technique offers a two-orders-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, EESPCA achieves these performance benefits while maintaining a reconstruction error close to that generated by these other approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.


Dr. Frost is an Assistant Professor of Biomedical Data Science at the Geisel School of Medicine at Dartmouth. Dr. Frost's research focuses on the development of bioinformatics and biostatistics methods for analyzing high-dimensional genomic data with a specific emphasis on methods for analyzing the output of genome-wide single cell assays, e.g., single cell RNA-sequencing (scRNA-seq).  Other bioinformatics research interests include cancer genomics, cancer immunology, tissue-specific gene function, and detection of gene-environment and gene-gene interactions. Statistical research interests include hypothesis aggregation and weighting, penalized regression, principal component analysis and random matrix theory.

Zoom Link: https://maine.zoom.us/my/usm.datascience

Contact Information

Sharon Watterson