As a scientific researcher in biology, my goal is to improve the understanding of life, in particular at the cellular and molecular level. To do so, I design, develop and use computational methods and tools to analyze genomics, transcriptomics and other -omics data, mostly obtained from sequencing experiments. My field of expertise is called computational biology, it lies at the interface between molecular biology and bioinformatics.
More precisely, my research activities focus on genome annotation and 3D genomics, which mostly involve omics data analysis.
Below are listed a few research projects I am or was involved in.
Benchmarking differential analysis tools for Hi-C data
Project description: The three-dimensional organization of the genome plays a crucial role in various biological processes. Hi-C technology is widely used to investigate chromosome structures by quantifying spatial interactions between genomic regions. While numerous computational tools exist for detecting differences in Hi-C data between conditions, a comprehensive review and benchmark comparing their effectiveness is lacking.
This study offers a comprehensive review and benchmark of ten generic tools for differential analysis of Hi-C matrices at the interaction count level. The benchmark assesses the statistical methods, usability, and performance (in terms of precision and power) of these tools, using both real and simulated Hi-C data. Results reveal a striking variability in performance among the tools, highlighting the substantial impact of preprocessing filters and the difficulty all tools encounter in effectively controlling the false discovery rate across varying resolutions and chromosome sizes.
Founding: No specific founding
Project status: Ongoing
Main collaborators: Nathalie Vialaneix (INRAE, MIAT), Pierre Neuvial (CNRS, IMT), Matthias Zytnicki (INRAE, MIAT), Elise Jorge (INRAE, GenPhySE)
Role in this project: Contributor
The regulatory GENomE of SWine and CHicken: functional annotation during development
Project description:
GENE-SWitCH aims to deliver new underpinning knowledge on the functional genomes of two main monogastric farm species (pig and chicken) and to enable immediate translation to the pig and poultry sectors. The activation status of functional genome sequences varies across time and space, and in response to environmental perturbations. In full coordination and synergy with global effort and ongoing projects of the Functional Annotation of Animal Genomes (FAANG) community, we will characterize the dynamics (“switches”) of the functional genome from embryo (chicken) and fetus (pig) to adult life by targeting a panel of tissues relevant to sustainable production.
Founding:
EU, 2019-2023
Project status:
Ongoing
Main collaborators:
INRAE/INSERM, Roslin Institute, WUR, EBI
Role in this project:
WP2 contributor
FR-AgENCODE: a FAANG pilot project for the functional annotation of livestock genomes
Project description:
As part of the FAANG action (Functional Annotation of ANimal Genomes), the FR-AgENCODE project aims at improving the genomic annotation of 4 livestock species: cattle (Bos taurus), goat (Capra hircus), chicken (Gallus gallus), pig (Sus scrofa). This is achieved by performing molecular assays on tissue dissociated cells (liver) and on sorted primary cells (CD4+ and CD8+ T lymphocytes) from 2 males and 2 females of each species. These assays include RNA-seq, ATAC-seq and Hi-C to characterize the transcriptome, the chromatin accessibility and the genome 3D topology in these cells, respectively.
Founding:
INRAE, Animal Genetics division, 2015-2017
Project status:
Phase 1 completed, Phase 2 ongoing
Main collaborators:
INRAE
Role in this project:
Co-coordinator, leader of the data analysis WP
HiC-DOC: detecting and comparing genomic compartments from Hi-C data
Project description:
Genomic compartmentalization is a biological factor affecting cell functionality. Different compartments
can be observed in the nucleus of eukaryotic cells, grouping genomic regions into clusters. Active
compartments are usually associated with open chromatin and gene expression while inactive compartments
are usually associated with closed chromatin and gene repression [1]. Analysis of data produced by the Hi-C
protocol reveals compartmentalization of chromatin in the nucleus, which can vary as a tissue develops.
Today, existing methods to detect genomic compartmentalization are limited in at least one of the following
ways: detecting compartments qualitatively with no confidence measure, ignoring experimental biases,
and/or dismissing replicate variability.
We propose an improvement over existing methodology to detect compartments and compare
compartmentalization between conditions. First, we properly correct the diverse biases inherent to Hi-C data,
using cyclic loess normalization [2] to reduce technical biases and Knight-Ruiz matrix balancing [3] to
mitigate biological biases. Then, we correct interaction counts with a loess regression to clearly expose the
compartmentalization information captured by the data. Finally, we use an unsupervised learning method,
constrained K-means [4], to computationally detect compartments from the normalized data. This method
enables us to produce quantitative “concordance” values for each genomic region in each replicate,
supporting our compartment predictions. Finally, we use these concordance values for differential analysis of
compartmentalization between conditions. From their distributions, we obtain p-values revealing the
significance of each predicted compartment change.
Founding:
INRAE
Project status:
Ongoing
Main collaborators:
INRAE (MIAT)
Role in this project:
Co-coordinator, contribution to the bioinformatics analysis
Tree-diff: detecting significant changes between groups of trees from HAC
Project description:
Trees are frequently used to describe data that are organized hierarchically. This
is the case, for instance, in phylogeny, or when using statistical methods such as
hierarchical agglomeration clustering (HAC) or Classification and Regression Trees.
The TREE-DIFF projects aims at developing an approach to provide statistical
guarantees for the comparison between two sets of trees corresponding to two different
conditions. Our approach builds on tree distances and an aggregation procedure for
moderated Student tests performed at the level of leaf pairs. Numerical experiments
confirm its statistical validity even for small sample size. The method is illustrated
with various practical applications in the field of biology, including Genome Wide
Association Studies (GWAS), study of chromatin conformation with Hi-C data and
phylogenetic studies.
Founding:
INRAE, CNRS
Project status:
Completed
Main collaborators:
INRAE (MIAT), CNRS (IMT), INRIA
Role in this project:
Partner, contribution to the bioinformatics analysis
Getting true Pluripotent Stem Cells in Pigs: a key step for large scale ex-vivo “Genotype to Phenotype” studies
Project description:
Current global changes (global warming, availability of agricultural resources, societal perception of animal husbandry, health importance of zoonoses) are forcing us to rethink our production systems. The pressure of animal production on ecosystems must be reduced, food and health security must be increased and animal welfare in breeding must be better addressed. To achieve these global objectives, the integration of the digital dimension for the management of farms is essential. Its coupling with innovative cellular systems will make it possible to evaluate at high-speed phenotypes that are difficult to measure in breeding on live animals, and therefore to acquire quantities of data suited to the methodologies developed for «big data«. This strategy also reduces the need for animal testing in accordance with the 3R rule (Replace, Reduce, Refine). We propose, within the framework of this project, to use the numerical dimension from multi-omics data at the single-cell and tissue scale to predict the molecules necessary and sufficient to maintain porcine pluripotency and transfer this knowledge for the production and use of porcine pluripotent stem cell lines (PSCs) for animal and human health applications. This approach, breaking with traditional experimental approaches, will represent a major breakthrough for genetic, pharmaceutical or toxicological studies. Indeed, improving resistance to animal diseases has long been a research priority that is struggling to progress due to the lack of high throughput phenotyping method. The objectives of the project are as follows: 1) Molecular characterization of the microenvironment of the porcine embryo before implantation. 2) Production of cell lines with reporter systems allowing tracing and sorting of porcine pluripotent cells. 3) Optimization of the combination of exogenous factors necessary for reprogramming to a state of authentic pluripotency. 4) Identification of signaling molecules necessary and sufficient for the maintenance of porcine pluripotent cells in vitro. 5) The production of porcine pluripotent lines with full potential for differentiation
Founding:
ANR, France
Project status:
Completed
Main collaborators:
INRAE
Role in this project:
Contribution to the bioinformatics analysis
Pig3Dgenome: comparing the 3D genomic structure of porcine muscle cells during late development
Project description:
The three dimensional organization of the genome plays a major role in the regulation of gene expression. Chromosome territories, compartments, topological domains, and loops, are the main features of the genome topology. Most of these features are quite stable ensuring a suitable niche for maintaining either transcriptional activation or repression. However, the structural plasticity of the chromatin also permits conformational changes that may lead to alterations in the transcriptional activity. These dynamic changes are particularly remarkable during gene expression reprograming occurring in early development (i.e. zygote genome activation, transition from pluripotent to lineage-committed cells, and cell differentiation). However, these dynamic events remain poorly understood, especially those concerning late development and tissue maturity processes. Our study offers new insights into the 3D genome organization dynamics at late gestation in mammals. More precisely, we addressed the global genome organization of porcine muscle nuclei at 90 and 110 days of gestation by performing in situ Hi-C experiments. This stage of gestation is a relevant period for porcine muscle development and maturity, as already shown in a previous transcriptome study. We obtained evidence of important topological changes in the 3D genome structure at this period that are associated to variations in gene expression. This dynamic changes correspond to a global fragmentation of the genome, switches of compartment type, differential chromatin interactions and dynamics of the telomeric regions.
Founding:
INRAE, CNRS
Project status:
Completed
Main collaborators:
INRAE (MIAT), CNRS (IMT)
Role in this project:
Coordinator, leader of the bioinformatics analysis
ENCODE: ENCyclopedia Of Dna Elements
Project description:
The Encyclopedia of DNA Elements (ENCODE) Consortium is an ongoing international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
ENCODE investigators employ a variety of assays and methods to identify functional elements. The discovery and annotation of gene elements is accomplished primarily by sequencing a diverse range of RNA sources, comparative genomics, integrative bioinformatic methods, and human curation. Regulatory elements are typically investigated through DNA hypersensitivity assays, assays of DNA methylation, and immunoprecipitation (IP) of proteins that interact with DNA and RNA, i.e., modified histones, transcription factors, chromatin regulators, and RNA-binding proteins, followed by sequencing.
Founding:
NHGRI
Project status:
Ongoing but my personal contribution was mainly during the early phases of the project (pilot phase and ENCODE2)
Main collaborators:
CRG, Affymetrix, CSHL, UCSC
Role in this project:
Contributions to the bioinformatics analysis