Statistics·Econometrics·Machine Learning Seminar at ENSAE Paris
If you wish to present in the seminar, please register here and contact the organizers.
Evgenii Chzhen (Orsay) February 5, 2020 |
Algorithmic Fairness in Classification and Regression The goal of this talk is to introduce the audience to the problem of algorithmic fairness. I will provide a general overview on the topic, describe various available frameworks of fairness in classification and regression, and present main approaches to tackle this problem. If time permits, I will present some very recent theoretical results both in classification and regression. |
Thomas Berrett (CREST) February 19, 2020 |
Local Differential Privacy In recent years, it has become clear that in certain studies there is a need to preserve the privacy of the individuals whose data is collected. As a way of formalising the problem, the framework of differential privacy has prevailed as a natural solution. The privacy of the individuals is protected by randomising their original data before any statistical analysis is carried out and hiding the original data from the statistician. In fact, in local differential privacy, each original data point is only ever seen by the individual it belongs to. Research in the area focuses on constructing mechanisms to privatise the data that strike the optimal balance between protecting the privacy of the individuals in the study and allowing the best statistical performance. In many cases it is possible to find minimax rates of convergence under this constraint and thus to quantify the statistical cost of privacy. In this talk I will provide an introduction to the field before presenting some new results. |
Victor-Emmanuel Brunel (CREST) March 4, 2020 |
Stein's method and Berry-Esseen bounds TBA. |
Yannick Guyonvarch (CREST) March 18, 2020 |
On the use of self normalized sums to build non-asymptotic confidence sets. TBA. |
Jules Depersin (CREST) January 22, 2020 |
Robust and Fast Estimation for Heavy-Tailed Distributions When it comes to estimating the mean of a heavy-tailed distribution (or in the presence of outliers), the empirical mean does not give satisfying results. This issue has been dealt with using tools such as Median-Of-Mean (MOM) estimators. Such estimators are very simple to compute and give optimal rates of convergence when the dimension of the random variable is small, but fail to do so in high-dimensional set-ups. We will try to explain why, giving simple exemples and intuitions, and we will introduce tools needed to study high dimensions. |
Julien Chhor (CREST) January 8, 2020 |
Minimax Testing in Random Graphs In a lot of recent statistical applications, the intensifying use of networks has made large random graphs a decisive field of interest. To name a few topics, we can mention community detection (in the stochastic block model or in social networks), as well as network modelling, or in modelling the brain. On the other hand, the existing literature about hypothesis testing is profuse. Yet quite surprisingly, only little literature exists about hypothesis testing in random graphs. In this talk, we fill the gap by studying two different testing problems in inhomogeneous Erdös-Rényi random graphs. After having introduced general tools for minimax testing, we first study a two sample testing problem in random graphs under sparsity constraints and second, the goodness-of-fit problem (also called identity testing problem), for which we identify minimax-optimal adaptive tests. |
Théo Lacombe (INRIA Saclay) December 4, 2019 |
An Introduction to Topological Data Analysis Topological Data Analysis (TDA) is a recent approach in Data Sciences that aims to encode some structured objects---think of graphs, time series, points on a manifold for instance---with respect to the topological information they contain. The first half of this introductive lecture will give a high-level picture of TDA. We will then briefly introduce the persistent homology, a notion coming from algebraic topology that is central in TDA to build our topological signatures. Finally, the last part of the talk will present some statistical and learning aspects of TDA. |
François-Pierre Paty (CREST) November 20, 2019 |
An Introduction to Optimal Transport Optimal transport (OT) dates back to the end of the 18th century, when French mathematician Gaspard Monge proposed to solve the problem of déblais and remblais. Yet, the mathematical formulation of Monge was rapidly found to meet its limits in the lack of provable existence of the studied objects. It is only after 150 years that OT enjoyed a resurgence, when Kantorovich understood the suitable framework that would allow to solve Monge’s problem and give rise to fundamental tools and theories in probability, optimization, differential equations and geometry. While applications in economics have a long history, it has only been recently that OT has been applied to statistics and machine learning, as a way to analyze data. In this mini-lecture, I will first define OT and present the most prominent results of OT theory. Then, I will give an overview of the current research in statistical and algorithmic OT, with an emphasis on machine learning and economics applications. |
Badr-Eddine Chérief-Abdellatif (CREST) November 6, 2019 |
Theoretical Study of Variational Inference Bayesian inference provides an attractive learning framework to analyze and to sequentially update knowledge on streaming data, but is rarely computationally feasible in practice. In the recent years, variational inference (VI) has become more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Nevertheless, despite promising results in real-life applications, only little attention has been put in the literature towards the theoretical properties of VI. In this talk, we aim to present some recent advances in theory of VI. First, we show that variational inference is consistent under mild conditions and retains the same properties than exact Bayesian inference in the batch setting. Then, we study several online VI algorithms that are inspired from sequential optimization in order to compute the variational approximations in an online fashion. We provide theoretical guarantees by deriving generalization bounds and we present empirical evidence in support of this. |