Antoine Limasset
Chargé de Recherche CNRS, CRIStAL, Université de Lille
Member of the Bonsai Team
About Me
I am a CNRS researcher at the CRIStAL laboratory in Lille, France, where I am a member of the Bonsai bioinformatics team.
My research focuses on developing efficient computational methods to analyze the massive amounts of data generated by DNA sequencing technologies, bridging the gap between theoretical computer science and practical biological data analysis.
My goal is to create scalable, efficient, and accessible tools that enable new discoveries in genomics, transcriptomics, and metagenomics.
📚 Main Publications
2025
- Fractional hitting sets for efficient multiset sketching
- Venue: Algorithms for Molecular Biology
- Tool: SuperSampler
- K2R: Tinted de Bruijn Graphs implementation for efficient read extraction from sequencing datasets
- Venue: Bioinformatics Advances
- Tool: K2R
- Hyper-k-mers: Efficient Streaming k-mers Representation
- Venue: International Conference on Research in Computational Molecular Biology (RECOMB)
- Tool: KFC
- REINDEER2: practical abundance index at scale
- Venue: SPIRE2025
- Tool: Reindeer2
- Accelerating k-mer-based sequence filtering
- Venue: bioRxiv (Preprint)
-
- OReO: optimizing read order for practical compression
- Venue: Bioinformatics Advances
- Tool: OREO
2024
- Automated evaluation of multiple sequence alignment methods to handle third generation sequencing errors
- Venue: PeerJ
- Tool: MSALIMIT
- Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets
- Venue: Bioinformatics (ISMB Proceedings)
- Tool: CBL
- Brisk: Exact resource-efficient dictionary for k-mers
- Venue: bioRxiv (Preprint)
- Tool: Brisk
2023
- Scalable sequence database search using partitioned aggregated Bloom comb trees
- Venue: Bioinformatics (ISMB Proceedings)
- Tool: PAC
- Locality-preserving minimal perfect hashing of k-mers
- Venue: Bioinformatics (ISMB Proceedings)
- Tool: LPHASH
2020
- Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs
- Venue: Bioinformatics
- Tool: BCOOL
2021
- BLight: Efficient exact associative structure for k-mers
- Venue: Bioinformatics
- Tool: BLight
- Toward optimal fingerprint indexing for large scale genomics
- Scalable long read self-correction and assembly polishing with multiple sequence alignment
- Venue: Scientific Reports
- Tool: CONSENT
2020
- ELECTOR: evaluator for long reads correction methods
- Venue: NAR Genomics and Bioinformatics
- Tool: ELECTOR
- A resource-frugal probabilistic dictionary and applications in bioinformatics
- Venue: Discrete Applied Mathematics
- Tool: SRC
2017
- Fast and scalable minimal perfect hashing for massive key sets
- Venue: 16th International Symposium on Experimental Algorithms (SEA)
- Novel approaches for the exploitation of high throughput sequencing data (PhD Thesis)
- Venue: Université Rennes 1
2016
- Compacting de Bruijn graphs from sequencing data quickly and in low memory
- Venue: Bioinformatics (ISMB Proceedings)
- Read mapping on de Bruijn graphs
- Venue: BMC Bioinformatics
- Tool: BGREAT
2014
- On the representation of de Bruijn graphs
- Venue: International Conference on Research in Computational Molecular Biology (RECOMB)
📖 Other Publications
2022
- Critical assessment of metagenome interpretation: the second round of challenges
2021
- STRONG: metagenomics strain resolution on assembly graphs
- Venue: Genome Biology
- Tool: STRONG
- Chromosome-level genome assembly reveals homologous chromosomes and recombination in asexual rotifer Adineta vaga
All my projects are open-source. Here are some of the key software packages developed.
- BCOOL: A conservative short-read error corrector using a meticulously cleaned de Bruijn graph.
- BGREAT: A tool for mapping sequencing reads directly onto a de Bruijn graph.
- BLIGHT: A static and exact k-mer dictionary using minimizer-partitioned MPHFs for high performance.
- BRISK: An exact and dynamic dictionary for large k-mer sets, offering significant memory savings.
- BRRR: A linear-time long-read correction tool based on the k-mer spectrum.
- BWISE: A short-read assembler designed to handle highly heterozygous and polyploid genomes.
- CBL: A dynamic k-mer set structure supporting fast set operations like union and intersection.
- CONSENT: A scalable tool for long-read self-correction and assembly polishing.
- ELECTOR: An evaluation tool for long-read error correction methods, providing detailed metrics.
- K2R: An index that efficiently maps k-mers to the specific reads they came from (Tinted de Bruijn Graph).
- KFC: A fast k-mer counter based on the novel and space-efficient hyper-k-mer data structure.
- Kloe: A tool for compressing and retrieving k-mer sets from a specific dataset within a large collection.
- LPHASH: A locality-preserving minimal perfect hash function library that is highly space-efficient for k-mer sets.
- MSALIMIT: A benchmark framework for the systematic evaluation of Multiple Sequence Alignment techniques.
- NIQKI: A tool for ultra-fast, large-scale comparison of genomic sketches using an inverted index.
- ONIKA: An improved version of NIQKI with tunable fingerprints and a balanced inverted index.
- OREO: A tool that reorders long reads based on a draft assembly to improve their compressibility.
- OREOMINI: A lightweight version of OREO using one-bit MinHash sketches for fast read partitioning.
- PAC: A scalable tool for indexing and querying k-mer presence in massive collections of sequencing data.
- REINDEER2: An index for storing and querying k-mer abundances across large dataset collections.
- STRONG: A pipeline for resolving microbial strains from metagenomic assembly graphs.
- SUPERSAMPLER: A sketching tool that uses a fractional hitting set of super-k-mers for efficient comparison.
Team & Supervision
Current PhD Students
- Yohan Hernandez-Courbevoie (PhD Director, 2024-present)
- Timothé Rouzé (PhD Co-supervisor, 2023-present)
- Léa Vandamme (PhD Director, 2021-present)
PhD Students
- Coralie Rohmer (PhD Co-supervisor, 2019-2022)
Diploma
- 2025: “Habilitation à diriger des recherche” (post-doctoral degree authorizing the supervision of PhD students), Université de Lille
- 2017: Ph.D in Computer Science, Université de Rennes 1
- Thesis: Novel approaches for the exploitation of high throughput sequencing data
- Supervisors: Pierre Peterlongo and Dominique Lavenier
- 2014: M.Sc. in Computer Science, École Normale Supérieure de Rennes
- 2012: B.Sc. in Computer Science, École Normale Supérieure de Cachan
Professional Experience
- 2018 - Present: CNRS Researcher, CRIStAL, Lille, France
- 2017 -2018: Postdoctoral Researcher, Université Libre de Bruxelles, Belgium
- Worked on de novo assembly of heterozygous genomes.
Grants & Funding
- 2024: MIC INSERM (Principal Investigator), €554k
- Analyse efficace et évolutive du cancer par exploration transcriptomique avancée à grande échelle
- 2024: ANR (Member), €500k
- 2021: ANR JCJC (Principal Investigator), €227k
- Adequate graph structures for third-generation sequencing data exploration
- 2019: Region Haut de France PhD Grant (Principal Investigator), €150k
Professional Service
Conference Committees
- Program Committee: RECOMB (2020-2024), ACM-BCB (2020-2024), ECCB/ISMB (2020-2024), SeqBim (2020-2024).
- Organizational Committee: SPIRE (2021).
Reviewer Activities
- Nature Methods, Nature Communication, Genome Research, Genome Biology, Nucleic Acids Research, Bioinformatics, Scientific Reports, and others.
Thesis Committees
- Nastasija Mijovic (PhD Committee, 2023-2025)
- Riku Walve (Examiner, 2022)
- Svitlana Lukicheva (PhD Jury, 2021)
- Théo Lemane (PhD Committee, 2020-2021)
- Nadege Guiglielmoni (PhD Committee, 2019-2020)
Invited Talks & Presentations
- 2024: EMBL-EBI Kmer/sequence indexing workshop, Cambridge, UK
- 2024: Kmer days, Dijon, France
- 2023: ISMB, Lyon, France
- 2022: RECOMB, San Diego, US
- 2022: DSB, Düsseldorf, Germany
- 2022: TUDASTIC, Lille, France
- 2021: Kmer days, Marville, France
- 2019: Biata, Saint Petersburg, Russia
- 2018: RECOMB, Paris, France
Antoine Limasset CRIStAL (UMR 9189)
Université de Lille, Bâtiment ESPRIT
59655 Villeneuve d’Ascq, France
Email: antoine.limasset@cnrs.fr