We study MCMC algorithms for Bayesian hierarchic al models, the computational complexity of which scales linearly with the number of observations and of parameters in the model. We focus on cross ed random effects and nested multilevel models, which are used ubiquitous ly in applied statistics, and consider methodologies built around Gibbs s ampling, mean-field variational Bayes, sparse linear algebra and belief p ropagation. For certain combinations of algorithm and model we establish theoretical guarantees for scalability and for others the lack thereof, l everaging connections to random graphs theory and statistical asymptotics . We illustrate the computational methodology on real data analyses on pr edicting electoral results and real estate prices, comparing with off-the -shelf variational approximations and Hamiltonian Monte Carlo.

UID:20210924T163000Z-237632@calendar.tamu.edu DTSTAMP:20210409T131613Z URL:https://calendar.tamu.edu/statistics/event/237632-departmental-colloq uium-giacomo-zanella LAST-MODIFIED:20210901T155609Z RRULE:FREQ=WEEKLY;UNTIL=20211126T173000Z;INTERVAL=1;BYDAY=FR EXDATE:20210910T163000Z,20211015T163000Z,20211022T163000Z,20211029T163000 Z,20211105T163000Z,20211112T173000Z,20211119T173000Z,20211126T173000Z ATTACH:https://calendar.tamu.edu/live/image/gid/223/width/200/height/200/ crop/1/src_region/0,0,300,299/7222_Zanella_Giacomo.rev.1627418114.jpg IMAGE;VALUE=URI;DISPLAY=BADGE,THUMBNAIL;FMTTYPE=image/jpeg:https://calend ar.tamu.edu/live/image/gid/223/width/200/height/200/crop/1/src_region/0,0 ,300,299/7222_Zanella_Giacomo.rev.1627418114.jpg IMAGE;VALUE=URI;DISPLAY=BADGE,THUMBNAIL;FMTTYPE=image/webp:https://calend ar.tamu.edu/live/image/gid/223/width/200/height/200/crop/1/src_region/0,0 ,300,299/7222_Zanella_Giacomo.rev.1627418114.webp X-LIVEWHALE-TYPE:events X-LIVEWHALE-ID:237632 X-LIVEWHALE-TIMEZONE:America/Chicago X-LIVEWHALE-IMAGE:https://calendar.tamu.edu/live/image/gid/223/width/200/ height/200/crop/1/src_region/0\,0\,300\,299/7222_Zanella_Giacomo.rev.1627 418114.jpg X-LIVEWHALE-CONTACT-INFO:Judith Moreno\nAdministrative/Events Coord inator

\n(979) 845-2171

\njmoreno@stat.tamu.edu X-LIVEWHALE-SUMMARY:

Learning probability measures based on an i.i.d. sample is a fundamental inference task, but is challenging when the sample space is high-dimensi onal. Inspired by the success of tree boosting in high-dimensional classi fication and regression, we propose a tree boosting method for learning h igh-dimensional probability distributions. We formulate concepts of "ad dition" and "residuals" on probability distributions in terms of co mpositions of a new, more general notion of multivariate cumulative distr ibution functions (CDFs) than classical CDFs. This then gives rise to a s imple boosting algorithm based on forward-stagewise (FS) fitting of an ad ditive ensemble of measures, which sequentially minimizes the entropy los s. The output of the FS algorithm allows analytic computation of the prob ability density function for the fitted distribution. It also provides an exact simulator for drawing independent Monte Carlo samples from the fit ted measure. Typical considerations in applying boosting–namely choosin g the number of trees, setting the appropriate level of shrinkage/regular ization in the weak learner, and the evaluation of variable importance– can all be accomplished in an analogous fashion to traditional boosting i n supervised learning. Numerical experiments confirm that boosting can su bstantially improve the fit to multivariate distributions compared to the state-of-the-art single-tree learner and is computationally efficient. T his work is joint with my PhD student Naoki Awaya.

UID:20210924T163000Z-237632@calendar.tamu.edu DTSTAMP:20210409T131613Z URL:https://calendar.tamu.edu/statistics/event/237633-departmental-colloq uium-li-ma LAST-MODIFIED:20210901T155609Z ATTACH:https://calendar.tamu.edu/live/image/gid/223/width/200/height/200/ crop/1/src_region/0,0,233,300/6879_Ma_Li.rev.1617992709.jpg IMAGE;VALUE=URI;DISPLAY=BADGE,THUMBNAIL;FMTTYPE=image/jpeg:https://calend ar.tamu.edu/live/image/gid/223/width/200/height/200/crop/1/src_region/0,0 ,233,300/6879_Ma_Li.rev.1617992709.jpg IMAGE;VALUE=URI;DISPLAY=BADGE,THUMBNAIL;FMTTYPE=image/webp:https://calend ar.tamu.edu/live/image/gid/223/width/200/height/200/crop/1/src_region/0,0 ,233,300/6879_Ma_Li.rev.1617992709.webp X-LIVEWHALE-TYPE:events X-LIVEWHALE-ID:237633 X-LIVEWHALE-TIMEZONE:America/Chicago X-LIVEWHALE-IMAGE:https://calendar.tamu.edu/live/image/gid/223/width/200/ height/200/crop/1/src_region/0\,0\,233\,300/6879_Ma_Li.rev.1617992709.jpg X-LIVEWHALE-IMAGE-CAPTION:Li MaFacultyStatistical Science Duke University X-LIVEWHALE-CONTACT-INFO:Judith Moreno\nAdministrative/Events Coord inator

\n(979) 845-2171

\njmoreno@stat.tamu.edu X-LIVEWHALE-SUMMARY:

The goal of this talk is to describe probabilistic approach es to two major problems for dynamic networks, both of which are intricat ely connected to long range dependence in the evolution of such models:

1. Detecting the initial seed which resulted in the current state of the network: Imagine observing a static time slice of the network afte r some large time $n$ started with an initial seed. Suppose all one gets to see is the current topology of the network (without any label or age i nformation). Developing probably efficient algorithms for estimating the initial seed has inspired intense activity over the last few years in the probability community. We will describe recent developments in addressin g such questions including robustness results such as the fixation of so called hub vertices as time evolves.

2. Change point detection: C onsider models of growing networks which evolve via new vertices attachin g to the pre-existing network according to one attachment function $f$ ti ll the system grows to size $τ(n) < n$, when new vertices switch thei r behavior to a different function g till the system reaches size n. The goal is to estimate the change point given the observation of the network s over time with no knowledge of the functions driving the dynamics. We w ill describe non-parametric estimators for such problems.

UID:20210924T163000Z-237632@calendar.tamu.edu DTSTAMP:20210409T131613Z URL:https://calendar.tamu.edu/statistics/event/237634-departmental-colloq uium-shankar-bhamidi LAST-MODIFIED:20210901T155609Z ATTACH:https://calendar.tamu.edu/live/image/gid/223/width/200/height/200/ crop/1/src_region/0,0,973,1200/7247_Bhamidi_Shankar.rev.1628112920.jpg IMAGE;VALUE=URI;DISPLAY=BADGE,THUMBNAIL;FMTTYPE=image/jpeg:https://calend ar.tamu.edu/live/image/gid/223/width/200/height/200/crop/1/src_region/0,0 ,973,1200/7247_Bhamidi_Shankar.rev.1628112920.jpg IMAGE;VALUE=URI;DISPLAY=BADGE,THUMBNAIL;FMTTYPE=image/webp:https://calend ar.tamu.edu/live/image/gid/223/width/200/height/200/crop/1/src_region/0,0 ,973,1200/7247_Bhamidi_Shankar.rev.1628112920.webp X-LIVEWHALE-TYPE:events X-LIVEWHALE-ID:237634 X-LIVEWHALE-TIMEZONE:America/Chicago X-LIVEWHALE-IMAGE:https://calendar.tamu.edu/live/image/gid/223/width/200/ height/200/crop/1/src_region/0\,0\,973\,1200/7247_Bhamidi_Shankar.rev.162 8112920.jpg X-LIVEWHALE-CONTACT-INFO:Judith Moreno\nAdministrative/Events Coord inator

\n(979) 845-2171

\njmoreno@stat.tamu.edu X-LIVEWHALE-SUMMARY: