StatEcoML Seminar

The Statistics, Econometrics and Machine Learning seminar aims to deepen cooperation between researchers from the departments of statistics, econometrics and machine learning at CREST and beyond.

When: one talk every two weeks, on Wednesdays from 2pm to 3pm
Where: CREST, ENSAE IP Paris
Organizers: Arshak Minasyan, Ines Moutachaker, Martin Mugnier

Upcoming Talks

Tobias Wolfram
(University of Bielefeld, ENSAE)
June 29th, 2022

Man - Machine - Gene: Predicting (non)cognitive ability and educational attainment from nonstandard data
To what extent can nonstandard types of data predict psychological and social outcomes? We leverage a unique British dataset to study the predictive utility of short essays written at age 11 and genetic polymorphisms. Using state-of-the-art methods from natural language processing and genomics, we find that both approaches predict cognitive ability, non-cognitive traits and educational attainment with in part impressive precision: Performance based on the text samples (up to 61, 9 and 25%) mirrors that of teacher evaluations (up to 66, 19, 29%) obtained at the same age. Prediction from genetic data is overall substantial, but measurably smaller (up to 17, 5, 19%). Combining all three sources of data explains 38% of variation in educational attainment and 70% in cognitive ability, approaching test-retest reliability of benchmark intelligence tests. We conclude that in order to improve predictive performance in the social and behavioral sciences, more attention should be paid to nonstandard data sources.

Past Talks

Talks from 2021-2022
Charlotte Bunne (ETH) June 1st, 2022	Optimal Transport Modeling of Population Dynamics Cell populations are almost always heterogeneous in function and fate. To understand the plasticity of cells and their responses to molecular perturbations, such as drugs or developmental signals, it is vital to recover the underlying population dynamics and fate decisions of single cells. However, measuring features of single cells requires destroying them. As a result, a cell population can only be monitored with sequential snapshots, obtained by sampling a few particles that are sacrificed in exchange for measurements. In order to reconstruct individual cell fate trajectories, as well as the overall dynamics, one needs to re-align these unpaired snapshots, in order to guess for each cell what it might have become at the next step. Optimal transport theory can provide such maps, and reconstruct these incremental changes in cell states over time. This celebrated theory provides the mathematical link that unifies the several contributions to model cellular dynamics that we present here: Inference from data of an energy potential best able to describe the evolution of differentiation processes, building on the Jordan-Kinderlehrer-Otto (JKO) flow; recovery of differential equations modeling the stochastic transitions between cell fates in developmental processes; as well as zero-sum game theory models parameterizing distribution shifts upon interventions, which we employ to model heterogeneous responses of tumor cells to cancer drugs. Recently integrated into the JAX library OTT, these models extend the set of existing tools to handle cell dynamics with robust and flexible methods, and make for an exciting avenue of future work on inferring personalized cancer therapies from single-cell patient samples.
François-Pierre Paty (CREST) January 26th, 2022	Novel Optimal Transport Models and their Application in Labor Economics Optimal transport (OT) has been applied to economics since at least the mid 20th century. Taking the matching between firms and workers on the labor market as a guiding example, I will give a general introduction to OT before introducing novel OT tools that are well suited to model this problem. I will insist on the algorithmic side and show some simulations. The goal of this talk is twofold: for statisticians and machine learners, you will learn about very recent OT models that are still to be studied mathematically and applied outside of economics ; for economists, sociologists and people in finance, you will learn about OT and how you can use it and revisit it to model complex systems and behaviors. The talk does not require any prior knowledge on optimal transport or economics.
Thibault Randrianarisoa (LPSM, Sorbonne Université) December 1st, 2021	On Adaptive Confidence Sets for the Wasserstein Distances In the density estimation model, we investigate the problem of constructing adaptive honest confidence sets with radius measured in Wasserstein distance $W_p,p\geq 1$ , and for densities with unknown regularity measured on a Besov scale. As sampling domains, we focus on the $d$ -dimensional torus $\mathbb T^d$ , in which case $1\leq p\leq 2$ , and $\mathbb R^d$ , for which $p=1$ . We identify necessary and sufficient conditions for the existence of adaptive confidence sets with diameters of the order of the regularity-dependent $W_p$ -minimax estimation rate. Interestingly, it appears that the possibility of such adaptation of the diameter depends on the dimension of the underlying space. In low dimensions, $d\leq 4$ , adaptation to any regularity is possible. In higher dimensions, adaptation is possible if and only if the underlying regularities belong to some interval of width at least $d/(d-4)$ . This contrasts with the usual $L_p$ -theory where, independently of the dimension, adaptation requires regularities to lie in a small fixed-width window. For configurations allowing these adaptive sets to exist, we explicitly construct confidence regions via the method of risk estimation, centred at adaptive estimators. Those are the first results in a statistical approach to adaptive uncertainty quantification with Wasserstein distances. Our analysis and methods extend more globally to weak losses such as Sobolev norm distances with negative smoothness indices.
Tâm Lê (TSE) November 3rd, 2021	Nonsmooth Implicit Differentiation for Machine Learning and Optimization The Implicit Function Theorem (IFT) has found several applications in machine learning such as hyperparameter optimization or implicit neural networks training. We will review the IFT and its extensions to nonsmooth functions. The IFT first ensures the uniqueness and regularity of the implicit function and then provides a calculus i.e. a formula to compute the gradient. Its extensions to the nonsmooth case still provide uniqueness and regularity but no calculus. Recalling usual tools from nonsmooth analysis such as the Clarke Jacobian, we will explain with an example why they fail to generalize the implicit differentiation formula. We propose a solution to this issue using the recent notion of conservative Jacobian. This is a notion of Jacobian for nonsmooth functions compatible with the compositional rules of differential calculus which justifies why we can replace the derivatives by Clarke Jacobians in the implicit differentiation formula. We will see how this theory allows to obtain a convergence result for gradient descents implemented with implicit differentiation. We conclude the talk showcasing pathological training trajectories one can have when implicit differentiation is applied naively.
Talks from 2020-2021
Flore Sentenac (CREST) April 28th, 2021	Online Matching in Bipartite Graphs Exceptionnaly the talk will start at 3pm. Finding large matching in bipartite graphs is a classical problem with many practical and intuitive applications: maximizing the number of students enroled in some university, the number of paired patients to kidney donors... In this talk, we are interested in the sequential version of this problem, where the vertices arrive one after the other and the matching is built on the fly, which is largely motivated by its Internet advertising display applications. The talk will start with an introduction to the problem and its applications. We will then describe two classical algorithms used to treat the problem, namely GREEDY and RANKING. We will see how the performance of those algorithms evolves depending on the model considered. To analyze them, we will introduce classical tools in Online Algorithms Analysis and Random Graphs Theory through their application on a concrete example.
Yannick Guyonvarch (Télécom Paris) March 31st, 2021	Nonasymptotic inference on functionals in regular M-estimation problems This talk will be divided in three parts. We will first present several motivating examples which are popular in the applied econometric literature (average marginal effects in binary regression, average treatment effects). In the second part of the talk, we will present a new confidence interval (CI) for a scalar mean. Under existence of moments of order four, this CI is valid nonasymptotically and is more accurate asymptotically than a CI derived from the Bernstein concentration inequality. We will explain at length the connection between our new CI and existing approaches based on alternative concentration inequalities. In the final part of the presentation, we will show how to use the result obtained for scalar means to derive new nonasymptotic CIs for smooth functionals in regular M-estimation models.
Solenne Gaucher (LMO) March 17th, 2021	Introduction to stochastic bandits The stochastic multi-armed bandit is used to model the following problem : at each time step, an agent must choose an action from a finite set, and receives a reward drawn i.i.d. from a distribution depending on the action she has selected. Her aim is to maximise her cumulative reward. The agent then faces a trade-off between collecting information on the mechanism generating the rewards, and taking the best action with regard to the information collected, so as to maximise her immediate reward. In this talk, I will present classical results in the stochastic multi-armed bandit setting. Next, I will show how these results can be extended to the continuum-armed bandits framework, where the expected reward for taking an action is modeled as a function of a covariate describing this action.
Sholom Schechtman (UPEM - Télécom Paris) February 24th, 2021	The ODE method in Optimization. Application to the convergence of stochastic subgradient descent for non-smooth and non-convex functions The ODE method associates to an optimization algorithm its continuous counterpart: an ordinary differential equation (ODE). Following the work of Benaim, under mild conditions the iterates of the algorithm will then shadow a solution to the ODE. As an application we will show the convergence of the stochastic gradient descent for a smooth (non-convex) function to its critical points. In the second part of the talk, following the work of Benaim, Hofbauer and Sorin, an extension to differential inclusions will be presented, and as a consequence a simple proof of convergence of the stochastic subgradient descent for a non-smooth and non-convex function to its critical points.
Anthony Strittmater (CREST) February 3rd, 2021	Efficient Targeting in Fundraising This paper studies efficient targeting in fundraising. In a large-scale field experiment, we randomly provide potential donors with a small unconditional gift. We then use causal machine learning methods to derive the efficient targeting of the fundraising instrument based on socio-economic characteristics, donation history, and geo-spatial information. In the warm list, efficient targeting increases the charity's profits significantly by 14%, even if the algorithm uses only the publicly available geo-spatial information. In the cold list, efficient targeting does not raise donations sufficiently to justify the additional costs of the fundraising instrument. We conclude that charities which do not efficiently target their fundraising efforts may waste significant resources.
Clément Gauchy (CEA/CMAP) January 20th, 2021	Adaptive importance sampling for fragility curve estimation As part of the risk assessment of the seismic safety of industrial installations, it is necessary to characterize the robustness of civil engineering structures to seismic loads. This characterization is often expressed in the form of fragility curves, which represent the conditional probability that the mechanical demand exceeds a given threshold for a given seismic intensity. Unfortunately, numerical simulations of mechanical structures are often costly in terms of computation time. In this context, it is crucial to develop experimental design methods to gain the maximum information with the smallest number of numerical code evaluations. Hence, our methodology consists of intertwin importance sampling and statistical learning in an adaptive fashion, in order to reduce the asymptotic variance of the training loss. We show by asymptotic analysis and numerical simulations that it allows fast convergence of the estimated fragility curve to the true fragility curve.
Boris Muzellec (INRIA) January 6th, 2021	Imputing missing data using regularized optimal transport Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Indeed, even with a small fixed proportion of missing values, ignoring data points with missing values quickly becomes impracticable as the dimension increases. Therefore, it is necessary to elaborate strategies to replace missing values with reasonable guesses. In this talk, we show how optimal transport (OT) tools can be used to impute data in a distribution-preserving way. We start with an introduction to the missing data problem and to regularized OT. We then show how OT can be used to turn a simple assumption - two batches extracted randomly from the same dataset should share the same distribution - into a loss function to impute missing data values. Finally, we present and demonstrate practical methods to minimize this loss, that can exploit or not parametric assumptions on the underlying distribution of values.
Talks from 2019-2020
Victor-Emmanuel Brunel (CREST) March 4th, 2020	Stein's method and Berry-Esseen bounds I will present the fundamentals of Stein’s method, based on fairly simple functional equations. This method allows to prove central limit theorems, as well as finite sample approximation bounds such as the well-known Berry-Esseen bounds for normal approximations. It is a very simple yet elegant method which extends far beyond the case of normal approximations for sum of independent variables: It also yields Berry-Esseen type bounds for more general random variables (such as the number of triangles in an Erdös-Rényi graph), as well as finite sample bounds for exponential, or Poisson, or other asymptotic approximations.
Thomas Berrett (CREST) February 26th, 2020	Local Differential Privacy In recent years, it has become clear that in certain studies there is a need to preserve the privacy of the individuals whose data is collected. As a way of formalising the problem, the framework of differential privacy has prevailed as a natural solution. The privacy of the individuals is protected by randomising their original data before any statistical analysis is carried out and hiding the original data from the statistician. In fact, in local differential privacy, each original data point is only ever seen by the individual it belongs to. Research in the area focuses on constructing mechanisms to privatise the data that strike the optimal balance between protecting the privacy of the individuals in the study and allowing the best statistical performance. In many cases it is possible to find minimax rates of convergence under this constraint and thus to quantify the statistical cost of privacy. In this talk I will provide an introduction to the field before presenting some new results.
Evgenii Chzhen (LMO) February 5th, 2020	Algorithmic Fairness in Classification and Regression The goal of this talk is to introduce the audience to the problem of algorithmic fairness. I will provide a general overview on the topic, describe various available frameworks of fairness in classification and regression, and present main approaches to tackle this problem. If time permits, I will present some very recent theoretical results both in classification and regression.
Jules Depersin (CREST) January 22th, 2020	Robust and Fast Estimation for Heavy-Tailed Distributions When it comes to estimating the mean of a heavy-tailed distribution (or in the presence of outliers), the empirical mean does not give satisfying results. This issue has been dealt with using tools such as Median-Of-Mean (MOM) estimators. Such estimators are very simple to compute and give optimal rates of convergence when the dimension of the random variable is small, but fail to do so in high-dimensional set-ups. We will try to explain why, giving simple exemples and intuitions, and we will introduce tools needed to study high dimensions.
Julien Chhor (CREST) January 8th, 2020	Minimax Testing in Random Graphs In a lot of recent statistical applications, the intensifying use of networks has made large random graphs a decisive field of interest. To name a few topics, we can mention community detection (in the stochastic block model or in social networks), as well as network modelling, or in modelling the brain. On the other hand, the existing literature about hypothesis testing is profuse. Yet quite surprisingly, only little literature exists about hypothesis testing in random graphs. In this talk, we fill the gap by studying two different testing problems in inhomogeneous Erdös-Rényi random graphs. After having introduced general tools for minimax testing, we first study a two sample testing problem in random graphs under sparsity constraints and second, the goodness-of-fit problem (also called identity testing problem), for which we identify minimax-optimal adaptive tests.
Théo Lacombe (INRIA Saclay) December 4th, 2019	An Introduction to Topological Data Analysis Topological Data Analysis (TDA) is a recent approach in Data Sciences that aims to encode some structured objects---think of graphs, time series, points on a manifold for instance---with respect to the topological information they contain. The first half of this introductive lecture will give a high-level picture of TDA. We will then briefly introduce the persistent homology, a notion coming from algebraic topology that is central in TDA to build our topological signatures. Finally, the last part of the talk will present some statistical and learning aspects of TDA.
François-Pierre Paty (CREST) November 20th, 2019	An Introduction to Optimal Transport Optimal transport (OT) dates back to the end of the 18th century, when French mathematician Gaspard Monge proposed to solve the problem of déblais and remblais. Yet, the mathematical formulation of Monge was rapidly found to meet its limits in the lack of provable existence of the studied objects. It is only after 150 years that OT enjoyed a resurgence, when Kantorovich understood the suitable framework that would allow to solve Monge’s problem and give rise to fundamental tools and theories in probability, optimization, differential equations and geometry. While applications in economics have a long history, it has only been recently that OT has been applied to statistics and machine learning, as a way to analyze data. In this mini-lecture, I will first define OT and present the most prominent results of OT theory. Then, I will give an overview of the current research in statistical and algorithmic OT, with an emphasis on machine learning and economics applications.
Badr-Eddine Chérief-Abdellatif (CREST) November 6th, 2019	Theoretical Study of Variational Inference Bayesian inference provides an attractive learning framework to analyze and to sequentially update knowledge on streaming data, but is rarely computationally feasible in practice. In the recent years, variational inference (VI) has become more and more popular for approximating intractable posterior distributions in Bayesian statistics and machine learning. Nevertheless, despite promising results in real-life applications, only little attention has been put in the literature towards the theoretical properties of VI. In this talk, we aim to present some recent advances in theory of VI. First, we show that variational inference is consistent under mild conditions and retains the same properties than exact Bayesian inference in the batch setting. Then, we study several online VI algorithms that are inspired from sequential optimization in order to compute the variational approximations in an online fashion. We provide theoretical guarantees by deriving generalization bounds and we present empirical evidence in support of this.