Stat·Eco·ML Seminar

Statistics·Econometrics·Machine Learning Seminar at ENSAE Paris

If you wish to present in the seminar, please register here and contact the organizers.

Covid-19: all the talks are cancelled.

Upcoming Talks

Yannick Guyonvarch
(CREST)
March 18th, 2020
Yannick Guyonvarch
Inference in finite samples: bridging the gap between econometrics and statistics
This talk will be concerned with the canonical problem of constructing a confidence set (CS) for a (scalar) mean. As we will see, the different approaches taken in econometrics and statistics to solve this problem are not so easy to reconcile. As a matter of fact, econometricians are mostly looking for CSs that have exact level asymptotically while statisticians mainly look for procedures that are valid for any sample size.
We will then review some powerful results on the finite sample properties of so-called self-normalized sums which help building CSs which are both valid for any sample size and have exact level asymptotically.
If time permits, we will present some improvements when the data is symmetrically distributed, and some additional results in multidimensional setups.
Geoffrey Chinot
(CREST)
April 1st, 2020
Geoffrey Chinot
High Dimensional linear models: Blessing or curse ?
TBA.
Xavier D'Haultfoeuille
(CREST)
April 15th, 2020
Xavier D'Haultfoeuille
TBA.
TBA.
Anna Simoni
(CNRS)
April 29th, 2020
Anna Simoni
TBA.
TBA.
Ao Wang
(CREST)
May 13th, 2020
Ao Wang
On the use of completeness conditions in Econometrics
TBA.

Past Talks

Victor-Emmanuel Brunel
(CREST)
March 4th, 2020
Victor-Emmanuel Brunel
Stein's method and Berry-Esseen bounds
I will present the fundamentals of Stein’s method, based on fairly simple functional equations. This method allows to prove central limit theorems, as well as finite sample approximation bounds such as the well-known Berry-Esseen bounds for normal approximations. It is a very simple yet elegant method which extends far beyond the case of normal approximations for sum of independent variables: It also yields Berry-Esseen type bounds for more general random variables (such as the number of triangles in an Erdös-Rényi graph), as well as finite sample bounds for exponential, or Poisson, or other asymptotic approximations.
Thomas Berrett
(CREST)
February 26th, 2020
Thomas Berrett
Local Differential Privacy
In recent years, it has become clear that in certain studies there is a need to preserve the privacy of the individuals whose data is collected. As a way of formalising the problem, the framework of differential privacy has prevailed as a natural solution. The privacy of the individuals is protected by randomising their original data before any statistical analysis is carried out and hiding the original data from the statistician. In fact, in local differential privacy, each original data point is only ever seen by the individual it belongs to.
Research in the area focuses on constructing mechanisms to privatise the data that strike the optimal balance between protecting the privacy of the individuals in the study and allowing the best statistical performance. In many cases it is possible to find minimax rates of convergence under this constraint and thus to quantify the statistical cost of privacy. In this talk I will provide an introduction to the field before presenting some new results.
Evgenii Chzhen
(Orsay)
February 5th, 2020
Evgenii Chzhen
Algorithmic Fairness in Classification and Regression
The goal of this talk is to introduce the audience to the problem of algorithmic fairness. I will provide a general overview on the topic, describe various available frameworks of fairness in classification and regression, and present main approaches to tackle this problem. If time permits, I will present some very recent theoretical results both in classification and regression.
Jules Depersin
(CREST)
January 22th, 2020
Jules Depersin
Robust and Fast Estimation for Heavy-Tailed Distributions
When it comes to estimating the mean of a heavy-tailed distribution (or in the presence of outliers), the empirical mean does not give satisfying results. This issue has been dealt with using tools such as Median-Of-Mean (MOM) estimators. Such estimators are very simple to compute and give optimal rates of convergence when the dimension of the random variable is small, but fail to do so in high-dimensional set-ups. We will try to explain why, giving simple exemples and intuitions, and we will introduce tools needed to study high dimensions.
Julien Chhor
(CREST)
January 8th, 2020
Minimax Testing in Random Graphs
In a lot of recent statistical applications, the intensifying use of networks has made large random graphs a decisive field of interest. To name a few topics, we can mention community detection (in the stochastic block model or in social networks), as well as network modelling, or in modelling the brain. On the other hand, the existing literature about hypothesis testing is profuse. Yet quite surprisingly, only little literature exists about hypothesis testing in random graphs. In this talk, we fill the gap by studying two different testing problems in inhomogeneous Erdös-Rényi random graphs. After having introduced general tools for minimax testing, we first study a two sample testing problem in random graphs under sparsity constraints and second, the goodness-of-fit problem (also called identity testing problem), for which we identify minimax-optimal adaptive tests.
Théo Lacombe
(INRIA Saclay)
December 4th, 2019
Théo Lacombe
An Introduction to Topological Data Analysis
Topological Data Analysis (TDA) is a recent approach in Data Sciences that aims to encode some structured objects---think of graphs, time series, points on a manifold for instance---with respect to the topological information they contain.
The first half of this introductive lecture will give a high-level picture of TDA.
We will then briefly introduce the persistent homology, a notion coming from algebraic topology that is central in TDA to build our topological signatures.
Finally, the last part of the talk will present some statistical and learning aspects of TDA.
François-Pierre Paty
(CREST)
November 20th, 2019
François-Pierre Paty
An Introduction to Optimal Transport
Optimal transport (OT) dates back to the end of the 18th century, when French mathematician Gaspard Monge proposed to solve the problem of déblais and remblais. Yet, the mathematical formulation of Monge was rapidly found to meet its limits in the lack of provable existence of the studied objects. It is only after 150 years that OT enjoyed a resurgence, when Kantorovich understood the suitable framework that would allow to solve Monge’s problem and give rise to fundamental tools and theories in probability, optimization, differential equations and geometry. While applications in economics have a long history, it has only been recently that OT has been applied to statistics and machine learning, as a way to analyze data. In this mini-lecture, I will first define OT and present the most prominent results of OT theory. Then, I will give an overview of the current research in statistical and algorithmic OT, with an emphasis on machine learning and economics applications.
Badr-Eddine
Chérief-Abdellatif

(CREST)
November 6th, 2019
Badr-Eddine Chérief-Abdellatif
Theoretical Study of Variational Inference
Bayesian inference provides an attractive learning framework to analyze and to sequentially update knowledge on streaming data, but is rarely computationally feasible in practice. In the recent years, variational inference (VI) has become more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Nevertheless, despite promising results in real-life applications, only little attention has been put in the literature towards the theoretical properties of VI. In this talk, we aim to present some recent advances in theory of VI. First, we show that variational inference is consistent under mild conditions and retains the same properties than exact Bayesian inference in the batch setting. Then, we study several online VI algorithms that are inspired from sequential optimization in order to compute the variational approximations in an online fashion. We provide theoretical guarantees by deriving generalization bounds and we present empirical evidence in support of this.