Programme detaillé

JEUDI 4 avril

10h-10h40 : Café/thé d'accueil

10h40-11h20 : Laure Sansonnnet

Titre : Sélection de variables dans des modèles GLARMA

Résumé : Dans cet exposé, on s'intéresse au problème de sélection de variables dans le cadre des modèles GLARMA (Generalized Linear Autoregressive Moving Average). Après avoir introduit ces modèles permettant de modéliser des séries temporelles à valeurs discrètes, on proposera une nouvelle approche efficace de sélection de variables. Elle consiste à combiner itérativement deux étapes : l'estimation des coefficients ARMA et la sélection de variables dans les coefficients de la partie GLM avec des méthodes régularisées. On s'intéressera en particulier aux lois discrètes suivantes : la loi de Poisson, et la loi binomiale négative permettant une surdispersion des observations. Les bonnes performances de ces méthodes seront illustrées sur des données synthétiques.
Les travaux exposés sont en collaboration avec Marina Gomtsyan, Céline Lévy-Leduc et Sarah Ouadah.

11h20-12h: Franck Picard

Titre : PCA for Point Processes

Résumé : Point processes constitute a common framework for modeling occurrences of events along space or time, with Poisson and Hawkes processes being fundamental examples. Statistical inference for point processes has been extensively studied, with nonparametric approaches and Bayesian methods being prominent. However, when many replicates of the process are available, exploratory methods are lacking to describe the heterogeneity or diversity of the observed patterns. Here, we introduce a new statistical perspective aiming to develop tools for dimension reduction and visualization of point processes, drawing inspiration from functional data analysis (FDA), particularly Principal Component Analysis (PCA) and functional PCA (fPCA). While classical PCA and fPCA do not directly apply to point processes, we develop a new framework to perform PCA on replicated point processes, based on Karhunen-Loève expansions of random measures. Our results include theoretical guarantees such as strong convergence of the KL decomposition, as well as a computational/visualization framework that allows us to unravel the variability of the observed point patterns. We illustrate our method on a wide variety of data, including earthquake data, neuroscience, and single-cell transcriptomics data. This work is a joint work with A. Roche, V. Rivoirard, and V. Panaretos.

12h-12h40 : Marie-Pierre Etienne

Titre : Detecting genomic alteration in genomic profiles: the infinite population case

Résumé : Two states Markov Jump process can be used to model alterations in genomic profiles along a chromosom (0 for normal state and 1 for alteration) in a normal cell.
Detecting recurrent alterations among a set of patients based on genomic profiles help to identify genomic regions and potentially genes involved in the disease process.
This may be formalized within a statistical test procedure and require to characterize the lengths of the excursions above a given threshold for the process of the cumulated profiles.
This work has been done when the size of the cohort is small. When the size of the population increases,
we prove that the cumulated process tends to an Ornstein Uhlenbeck (OU) process and we have a bound for the rate of convergence.
We prove that this rate of convergence also holds for the convergence of the longest excursion.

12h40-14h30 : Repas

14h30-15h10 : Clément Levrard

Titre : Statistical difficulty of support estimation and dimensionality reduction

Résumé : A common assumption in modern nonsupervised ML is the so-called 'Manifold assumption', that roughly supposes that data points, though observerd in a high-dimensional ambient space, are in fact elements of a low-dimensional hidden structure (a manifold). In this setting, we focus on two specific tasks: retrieving the hidden structure (support estimation) and embedding data points in a low-dimensional Euclidean space (dimensionality reduction).
Combining results from a recent line of work, in collaboration with E. Aamari, C. Aaron and C. Berenfeld, I will briefly expose how these two tasks are connected, and try to characterize their statistical difficulty (minimax rates on appropriate models) to partially answer the question: is dimensionality reduction easier than support estimation?

15h10-15h50 : Théo Lacombe

Titre : Homogeneous Unbalanced Regularized Optimal Transport

Résumé : Optimal transport (OT) is nowadays extremely popular in computational mathematics, thanks to its applications in machine learning. It has two natural extensions: entropy-regularized OT (ROT) and unbalanced OT (UOT), that have been mixed together by Séjourné et al., to derive an Unbalanced Regularized OT (UROT) problem. However, this problem may suffer from a caveat: it becomes non-homogeneous, a core property of either ROT or UOT. We propose an alternative where homogeneity is retrieved, while important properties of Séjourné's model are preserved, and discuss some applications in Topological Data Analysis.

Refs :
- Sinkhorn divergences for unbalanced optimal transport, Séjourné et al., arXiv:1910.12958
- An homogeneous unbalanced regularized optimal transport model with applications to optimal transport with boundary, Lacombe, AISTATS 2023

15h50-16h20 : Pause café

16h20-17h : Jean Dufraiche

Titre : Exploring Optimal Transport in Jazz Music Analysis. Application to the Real Book

Résumé : Ce travail se penche sur l'analyse mathématique du Real Book, un célèbre corpus de musique de jazz. Pour simplifier le problème, nous supposons que chaque pièce musicale est exprimée comme une séquence d'accords. Notre approche introduit une représentation des accords basée sur leurs emprunts aux différents modes pythagoriciens, une dimension qui semble négligée dans la littérature existante. Cette représentation innovante permet d'établir des dissimilarités entre les accords, ce qui constitue la base de la comparaison des pièces. Plus précisément, deux pièces de Real Book peuvent être représentées par la distribution empirique de leurs accords. La dissimilarité entre ces pièces est alors déterminée par le coût de transport optimal entre leurs distributions respectives. Ce calcul repose sur le coût entre accords défini précédemment.

VENDREDI 5 avril

9h30-10h10 : Anouar Meynaoui

Titre : Estimation adaptative dans le modèle linéaire fonctionnel à sortie fonctionnelle

Résumé : Dans cet exposé, on considère un modèle de régression linéaire fonctionnel à sortie fonctionnelle : la variable explicative et la variable réponse sont des variables aléatoires "fonctionnelles'', à valeurs dans un espace de Hilbert (typiquement un espace de fonctions). On s'intéresse à la question de l'estimation non-paramétrique adaptative de l'opérateur intégral reliant ces deux variables, à partir d'un échantillon. Une collection d'estimateurs par projection est d'abord construite : lorsque la base de projection choisie est la base de l'ACP empirique de la covariable, on obtient une décomposition biais-variance pour un risque quadratique moyen de prédiction. Une procédure de sélection de modèle (minimisation de contraste pénalisé) permet ensuite un choix automatique du meilleur estimateur dans la collection. Celui-ci satisfait une inégalité de type oracle, et atteint des vitesses de convergences minimax sur des espaces de régularité de type ellipsoïde : la borne supérieure du risque de prédiction correspond bien à la borne inférieure, que nous calculons également. Ces résultats théoriques sont illustrés par des simulations et des applications à des jeux de données réelles.

10h10-10h40 : Pause café

10h40-11h20 : Sunny Wang

Titre : Directional regularity: Achieving faster rates of convergence in multivariate functional data

Résumé : We consider a new notion of regularity, called directional regularity, which is relevant for a wide range of applications involving multivariate functional data. We show that among the class of seemingly isotropic processes, a subset of anisotropic processes exist. Faster rates of convergence may thus be obtained by adapting to their directional regularity through a change of basis. Algorithms are constructed for the estimation and identification of the directional regularity, made possible due to the unique replication nature of functional data, with accompanying non-asymptotic theoretical guarantees provided. A novel simulation algorithm for anisotropic processes is designed to evaluate the numerical accuracy of our directional regularity algorithm. Simulation results demonstrate the good finite sample properties of our estimator. Applications which elucidate the concrete benefits of our methodology are discussed and illustrated.

11h20-12h : Linus Bleistein

Titre : Statistical Aspects of Learning with Signatures

Résumé : Signatures are a powerful, parameter-free tool to learn from any sequential data stream and in particular time series. They offer a flexible alternative to neural methods, while being theoretically grounded and having deep connections with dynamical systems. This talk will feature a general introduction to signature methods and their applications, before considering a specific use case in dynamic survival analysis. In this setup, we consider the task of learning individual specific intensities of survival processes from static and longitudinal data. Modeling the intensities as solutions to non-parametric unknown differential equations allows us to provide a precise bias-variance decomposition of a signature-based estimator. This estimator yields excellent performance on a large array of both simulated and real datasets from finance, predictive maintenance and churn prediction.

12h-13h30 : Repas

13h30-14h10 : Fabienne Comte

Titre : Estimation nonparamétrique pour des EDS iid inhomogènes

Résumé : We consider N i.i.d. one-dimensional inhomogeneous diffusion processes (X_i(t), i=1, …, N) with drift mu(t,x)= a₁(t) g₁(x)+ … + a_K(t)g_K(x) and diffusion coefficient sigma(t,x), where K, the functions g_j(x) and sigma(t,x) are known. Our concern is the nonparametric estimation of the K-dimensional unknown function (a_j(t), j=1, … , K) from the continuous observation of the N sample paths (X_i(t)) throughout a fixed time interval [0,T]. A collection of projection estimators belonging to a product of finite-dimensional subspaces of L²([0,T]) is built. The L²-risk is defined by the expectation of either an empirical norm or a deterministic norm fitted to the problem. Rates of convergence for large N are discussed. A data-driven choice of the dimensions of the projection spaces is proposed. The theoretical results are illustrated by numerical experiments on simulated data.

14h10-14h50 : Diala Hawat

Titre : Repelled point processes with application to numerical integration

Résumé : Linear statistics of point processes yield Monte Carlo estimators of integrals. While the simplest approach relies on a homogeneous Poisson point process (PPP), more regularly spread point processes yield estimators with fast-decaying variance. Following the intuition that more regular configurations result in lower integration error, in this presentation we introduce the repulsion operator, which reduces clustering by slightly pushing the points of a configuration away from each other. Our empirical findings show that applying the repulsion operator to a PPP and, intriguingly, to regular point processes reduces the variance of the corresponding Monte Carlo method and thus enhances the method. This variance reduction phenomenon is substantiated by our theoretical result when the initial point process is a PPP. On the computational side, the complexity of the operator is quadratic and the corresponding algorithm can be parallelized without communication across tasks.

Preprint : https://arxiv.org/abs/2308.04825

Code : https://github.com/dhawat/MCRPPy

Website: https://dhawat.github.io/

14h50-15h20 : Pause café

15h20-16h : Lisa Balsollier

Titre : Analyse du mouvement des trajectoires d’un processus birth-death-mutation-move et détection des instants de changement de régime grâce aux modèles de Markov à états cachés et à un algorithme EM.

Résumé : Ma thèse se concentre sur un processus, appelé processus birth-death-move avec mutations permettant de modéliser la dynamique d’un système de particules qui se déplacent au fil du temps, tandis que de nouvelles particules peuvent apparaître et que certaines particules existantes peuvent disparaître. Dans ce modèle, les trajectoires peuvent être générées par n’importe quel modèle de diffusion de Markov continu, et peuvent changer de mouvement au cours de leur durée de vie. Après avoir présenté rapidement ce modèle, je m’attarderai sur l'analyse du type de régime suivi par chaque trajectoire et sur la détection des instants de changement de régime. Je m'appuierai sur un exemple, dans lequel les trajectoires peuvent suivre trois modèles de diffusion possibles : un mouvement brownien, un mouvement dirigé (qui est souvent un mouvement brownien dirigé), et un mouvement confiné (qui peut, par exemple, être modélisé par un processus d’Ornstein-Uhlenbeck). L'objectif est de déterminer les paramètres caractérisant ces trois types de mouvements, ainsi que d'identifier les moments où une particule passe d'un régime de mouvement à un autre. Pour cela, j'utiliserai un algorithme de type Expectation-Maximization (EM) conjointement avec un modèle de Markov à états cachés.

Vie privée | Accessibilité