Stat·Eco·ML Seminar

Statistics·Econometrics·Machine Learning Seminar at ENSAE Paris

If you wish to present in the seminar, please register here and contact the organizers.

Upcoming Talks

Evgenii Chzhen
(Orsay)
February 5, 2020
Evgenii Chzhen
Algorithmic Fairness in Classification and Regression
The goal of this talk is to introduce the audience to the problem of algorithmic fairness. I will provide a general overview on the topic, describe various available frameworks of fairness in classification and regression, and present main approaches to tackle this problem. If time permits, I will present some very recent theoretical results both in classification and regression.
Thomas Berrett
(CREST)
February 19, 2020
Thomas Berrett
Local Differential Privacy
In recent years, it has become clear that in certain studies there is a need to preserve the privacy of the individuals whose data is collected. As a way of formalising the problem, the framework of differential privacy has prevailed as a natural solution. The privacy of the individuals is protected by randomising their original data before any statistical analysis is carried out and hiding the original data from the statistician. In fact, in local differential privacy, each original data point is only ever seen by the individual it belongs to.
Research in the area focuses on constructing mechanisms to privatise the data that strike the optimal balance between protecting the privacy of the individuals in the study and allowing the best statistical performance. In many cases it is possible to find minimax rates of convergence under this constraint and thus to quantify the statistical cost of privacy. In this talk I will provide an introduction to the field before presenting some new results.
Victor-Emmanuel Brunel
(CREST)
March 4, 2020
Victor-Emmanuel Brunel
Stein's method and Berry-Esseen bounds
TBA.
Yannick Guyonvarch
(CREST)
March 18, 2020
Yannick Guyonvarch
On the use of self normalized sums to build non-asymptotic confidence sets.
TBA.

Past Talks

Jules Depersin
(CREST)
January 22, 2020
Jules Depersin
Robust and Fast Estimation for Heavy-Tailed Distributions
When it comes to estimating the mean of a heavy-tailed distribution (or in the presence of outliers), the empirical mean does not give satisfying results. This issue has been dealt with using tools such as Median-Of-Mean (MOM) estimators. Such estimators are very simple to compute and give optimal rates of convergence when the dimension of the random variable is small, but fail to do so in high-dimensional set-ups. We will try to explain why, giving simple exemples and intuitions, and we will introduce tools needed to study high dimensions.
Julien Chhor
(CREST)
January 8, 2020
Minimax Testing in Random Graphs
In a lot of recent statistical applications, the intensifying use of networks has made large random graphs a decisive field of interest. To name a few topics, we can mention community detection (in the stochastic block model or in social networks), as well as network modelling, or in modelling the brain. On the other hand, the existing literature about hypothesis testing is profuse. Yet quite surprisingly, only little literature exists about hypothesis testing in random graphs. In this talk, we fill the gap by studying two different testing problems in inhomogeneous Erdös-Rényi random graphs. After having introduced general tools for minimax testing, we first study a two sample testing problem in random graphs under sparsity constraints and second, the goodness-of-fit problem (also called identity testing problem), for which we identify minimax-optimal adaptive tests.
Théo Lacombe
(INRIA Saclay)
December 4, 2019
Théo Lacombe
An Introduction to Topological Data Analysis
Topological Data Analysis (TDA) is a recent approach in Data Sciences that aims to encode some structured objects---think of graphs, time series, points on a manifold for instance---with respect to the topological information they contain.
The first half of this introductive lecture will give a high-level picture of TDA.
We will then briefly introduce the persistent homology, a notion coming from algebraic topology that is central in TDA to build our topological signatures.
Finally, the last part of the talk will present some statistical and learning aspects of TDA.
François-Pierre Paty
(CREST)
November 20, 2019
François-Pierre Paty
An Introduction to Optimal Transport
Optimal transport (OT) dates back to the end of the 18th century, when French mathematician Gaspard Monge proposed to solve the problem of déblais and remblais. Yet, the mathematical formulation of Monge was rapidly found to meet its limits in the lack of provable existence of the studied objects. It is only after 150 years that OT enjoyed a resurgence, when Kantorovich understood the suitable framework that would allow to solve Monge’s problem and give rise to fundamental tools and theories in probability, optimization, differential equations and geometry. While applications in economics have a long history, it has only been recently that OT has been applied to statistics and machine learning, as a way to analyze data. In this mini-lecture, I will first define OT and present the most prominent results of OT theory. Then, I will give an overview of the current research in statistical and algorithmic OT, with an emphasis on machine learning and economics applications.
Badr-Eddine
Chérief-Abdellatif

(CREST)
November 6, 2019
Badr-Eddine Chérief-Abdellatif
Theoretical Study of Variational Inference
Bayesian inference provides an attractive learning framework to analyze and to sequentially update knowledge on streaming data, but is rarely computationally feasible in practice. In the recent years, variational inference (VI) has become more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Nevertheless, despite promising results in real-life applications, only little attention has been put in the literature towards the theoretical properties of VI. In this talk, we aim to present some recent advances in theory of VI. First, we show that variational inference is consistent under mild conditions and retains the same properties than exact Bayesian inference in the batch setting. Then, we study several online VI algorithms that are inspired from sequential optimization in order to compute the variational approximations in an online fashion. We provide theoretical guarantees by deriving generalization bounds and we present empirical evidence in support of this.