Variable selection and classification with large-scale presence only data
- 👤 Speaker: Garvesh Raskutti (University of Wisconsin-Madison)
- 📅 Date & Time: Friday 19 January 2018, 11:45 - 12:30
- 📍 Venue: Seminar Room 1, Newton Institute
Abstract
Co-author: Hyebin Song (University of Wisconsin-Madison)
In various real-world problems, we are presented with positive and unlabelled data, referred to as presence-only responses where the number of covariates $p$ is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this paper, we develop the \emph{PUlasso} algorithm for variable selection and classification with positive and unlabelled responses. Our algorithm involves using the majorization-minimization (MM) framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm is guaranteed to converge to a stationary point, and then prove that any stationary point achieves the minimax optimal mean-squared error of $\frac{s \log p}{n}$, where $s$ is the sparsity of the true parameter. We also demonstrate through simulations that our algorithm out-performs state-of-the-art algorithms in the moderate $p$ settings in terms of classification performance. Finally, we demonstrate that our PUlasso algorithm performs well on a biochemistry example.
Related Links
- https://arxiv.org/abs/1711.08129 – Link to Arxiv paper
Series This talk is part of the Isaac Newton Institute Seminar Series series.
Included in Lists
- All CMS events
- bld31
- dh539
- Featured lists
- INI info aggregator
- Isaac Newton Institute Seminar Series
- School of Physical Sciences
- Seminar Room 1, Newton Institute
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Garvesh Raskutti (University of Wisconsin-Madison)
Friday 19 January 2018, 11:45-12:30