Here is all the databases we formatted (on demand) for RDPClassifier and NCBI Blast+ Please be carefull on database licence and how to cite. For Silva databases, we propose reduced version based on a pintail score threshold. Le pintail score indicate the quality of the sequence. See https://www.arb-silva.de/documentation/faqs/ Section : "What do the green, yellow and orange quality bars tell me?" for a brief explanation or http://aem.asm.org/content/71/12/7724.abstract, for the pintail score paper. ##### SILVA/ 16S/ --> data related to the SILVA 16S database (https://www.arb-silva.de/download/arb-files/) silva_138.2_16S.tar.gz --> data related to the version 138.2 silva_138.2_16S_pintail50.tar.gz --> data related to the version 138.2, filtered on pintail score >=50 silva_138.2_16S_pintail80.tar.gz --> data related to the version 138.2, filtered on pintail score >= 80 silva_138.2_16S_pintail100.tar.gz -->data related to the version 138.2, filtered on pintail score = 100 silva_138.1_16S.tar.gz --> data related to the version 138.1 silva_138.1_16S_pintail50.tar.gz --> data related to the version 138.1, filtered on pintail score >=50 silva_138.1_16S_pintail80.tar.gz --> data related to the version 138.1, filtered on pintail score >= 80 silva_138.1_16S_pintail100.tar.gz -->data related to the version 138.1, filtered on pintail score = 100 silva_138_16S.tar.gz --> data related to the version 138 silva_138_16S_pintail50.tar.gz --> data related to the version 138, filtered on pintail score >=50 silva_138_16S_pintail80.tar.gz --> data related to the version 138, filtered on pintail score >= 80 silva_138_16S_pintail100.tar.gz -->data related to the version 138, filtered on pintail score = 100 silva_132_16S.tar.gz --> data related to the version 132 silva_132_16S_pintail50.tar.gz --> data related to the version 132, filtered on pintail score >=50 silva_132_16S_pintail80.tar.gz --> data related to the version 132, filtered on pintail score >= 80 silva_132_16S_pintail100.tar.gz -->data related to the version 132, filtered on pintail score = 100 silva_128_16S.tar.gz --> data related to the version 128 silva_128_16S_pintail50.tar.gz --> data related to the version 128, filtered on pintail score >= 50. silva_128_16S_pintail80.tar.gz --> data related to the version 128, filtered on pintail score >= 80. silva_128_16S_pintail100.tar.gz --> data related to the version 128, filtered on pintail score = 100. silva_123_16S.tar.gz --> data related to the version 123 18S/ --> data related to the SILVA 18S database (https://www.arb-silva.de/download/arb-files/) silva_138.2_18S.tar.gz --> data related to the version 138.2 silva_138.1_18S.tar.gz --> data related to the version 138.1 silva_138_18S.tar.gz --> data related to the version 138 silva_132_18S.tar.gz --> data related to the version 132 silva_128_18S.tar.gz --> data related to the version 128 silva_123_18S.tar.gz --> data related to the version 123 silva_119-1_18S.tar.gz --> data related to the version 119-1 SSU/ --> data related to the SILVA SSU database (https://www.arb-silva.de/download/arb-files/) silva_138_SSU.tar.gz --> data related to the version 138 23S/--> data related to the SILVA 23S database (https://www.arb-silva.de/download/arb-files/) silva_138.2_23S.tar.gz --> data related to the version 138.2 silva_138.1_23S.tar.gz --> data related to the version 138.1 silva_132_23S.tar.gz --> data related to the version 132 silva_128_23S.tar.gz --> data related to the version 128 silva_123_23S.tar.gz --> data related to the version 123 28S/ --> data related to the SILVA 28S database (https://www.arb-silva.de/download/arb-files/) SILVA_138.2_28S.tar.gz --> data related to the version 138.2 SILVA_138.1_28S.tar.gz --> data related to the version 138.1 SILVA_132_28S.tar.gz --> data related to the version 132 LSU/ --> data related to the SILVA LSU database (https://www.arb-silva.de/download/arb-files/) SILVA_132_LSU.tar.gz --> data related to the version 132 ##### Greengenes/ --> 16S data related to the greengenes database (http://greengenes.secondgenome.com/) greengenes_13_5.tar.gz --> data related to the version 13.5 ##### DAIRYdb/ --> 16S data related to the DAIRYdb database (16S rRNA gene sequences from dairy products, https://github.com/marcomeola/DAIRYdb) DAIRYdb_v1.1.2.tar.gz --> data related to the version 1.1.2 DAIRYdb_v1.2.4_20200604.tar.gz --> data related to the version v1.2.4_20200604 DAIRYdb_v2.0_20210401.tar.gz --> data related to the version v2.0_20210401 ##### EZBioCloud/ --> 16S data related to EZBioCloud database (https://www.ezbiocloud.net/resources/16s_download) EZBioCloud_052018.tar.gz --> release 05/2018 ##### PR2/ --> 18S data related to the The Protist Ribosomal Reference (PR2) database (https://github.com/vaulot/pr2_database/releases) pr2_gb203_4.5.tar.gz --> data related to the version v4.5 pr2_4.11.0 --> data related to the version v4.11.0 pr2_4.12.0 --> data related to the version v4.12.0 pr2_4.13.0 --> data related to the version v4.13.0 PR2_4.14.0 --> data related to the version v4.14.0 PR2_5.0.1 --> data related to the version v5.0.1 ##### Unite/ --> data related to the UNITE ITS database https://unite.ut.ee/) Unite_s_7.1_20112016_ITS.tar.gz --> data related to UNITE 7.1 database Unite_Fungi_8.0_18112018.tar.gz --> data related to UNITE 8.0 database focused on fungal species Unite_Euka_8.0_18112018.tar.gz --> data related to UNITE 8.0 database for all eukaryote species Unite_Fungi_8.2_20200204.tar.gz --> data related to UNITE 8.2 database focused on fungal species Unite_Euka_8.2_20200204.tar.gz --> data related to UNITE 8.2 database for all eukaryote species Unite_Fungi_8.3_20210510.tar.gz --> data related to UNITE 8.3 database focused on fungal species Unite_Euka_8.3_20210510.tar.gz --> data related to UNITE 8.3 database for all eukaryote species Unite_Fungi_9.0_20221016.tar.gz --> data related to UNITE 9.0 database focused on fungal species Unite_Euka_9.0_20221016.tar.gz --> data related to UNITE 9.0 database for all eukaryote species Unite_Euka_10.0_20240906.tar.gz --> data related to UNITE 10.0 database for all eukaryote species Unite_Fungi_10.0_20240906.tar.gz --> data related to UNITE 10.0 database focused on fungal species Unite_known_genus_fungi_personal_10.0_20240906.tar.gz --> data related to UNITE 10.0 database focused on fungal species with known genus ##### MiDas/ (Microbial Database for Activated Sludge : http://www.midasfieldguide.org/ ) MiDAS_S119_1.20.tar.gz --> data related to the MiDAS S119 1.20 database (based on Silva.119) MiDAS_S123_2.1.3.tar.gz --> data related to the MiDAS S123 2.1.3 database (based on Silva 123) MiDAS_S132_3.6.tar.gz --> data related to the MiDAS S132 3.6 database (based on Silva 132) MiDAS_S138.1_v4.8.1.tar.gz --> data related to the MiDAS S138 4.8.1 database (based on Silva 138) MiDAS_v5.0.tar.gz --> data related to the MiDAS 5.0 database ##### Nemabiome (database dedicated to community of nematodes that inhabit a single host animal or environmental niche: https://www.nemabiome.ca/) Nematode_ITS2_v1.6.0_2023-09-16.tar.gz --> data related to ITS2 sequences of Nemabiome 1.6 version. ##### rpoB/ (for rpoB marker, not yet published, see rpoB/readme.txt) rpoB_122017.tar.gz --> rpoB database is provided by DGiMi laboratory from INRA Montpellier rpoB_bacteria_NCBI_refseq_all_lvl_assembly_20240707.tar.gz --> contains rpob sequences found in 361 617 genomes (complete genome, chromosome, scaffold and contig) from refseq July 2024 rpoB_bacteria_NCBI_refseq_genome_complete_and_chromosome_20240707.tar.gz --> contains rpob sequences found in 46714 genomes (complete genome, chromosome) from refseq July 2024 ##### Diat.barcode/ (for rbcL diatoms barcode: https://www6.inrae.fr/carrtel-collection_eng/Barcoding-database ) initially named RSyst::diatom RSyst_Diatom_7.tar.gz --> data related to the R-Syst::Diatom version 7 (http://138.102.89.206/new_rsyst_alg/) Diat.barcode_rbcL_10.1.tar.gz --> data related to the Diat.barcode version 10.0 (https://www6.inrae.fr/carrtel-collection_eng/Barcoding-database/Database-download) ##### PHYMYCO_DB/ (EF1 and 18S fungal DNA markers: http://phymycodb.genouest.org/) PHYMYCO-DB_2013.tar.gz --> data related to the curated version of 2013. ##### COI/ --> reference databases relative to COI amplicon genes BOLD_COI-5P/ --> data related to BOLD database (http://v3.boldsystems.org) with selection on Phyla BOLD_COI-5P_022019 --> version downloaded February-2019 (see readme for phyla selected) BOLD_COI-5P_1percentN_022019 --> version downloaded February-2019, filtered on a maximum 1% of N (see readme for phyla selected) BOLD_COI-5P_1percentN_630nt_022019 --> version downloaded February-2019, filtered on a maximum 1% of N and minimum length of 630nt (see readme for phyla selected) BOLD_COI-5P_marin_052022 --> version downloaded May-2022 (see readme for taxon selected) BOLD_COI-5P_082023 --> version downloaded August-2023 (see readme for phyla selected) BOLD_COI-5P_marin_20230913 --> version downloaded September-2023 (see readme for taxon selected) MIDORI/ --> COI reference database relative to MIDORI database (http://reference-midori.info/index.html) MIDORI_LONGEST_SP_COI_GB242 --> version GB242 of longest species sequences MIDORI_UNIQUE_COI_20180221 --> version 20180221, uniq amplicon sequences MIDORI_UNIQUE_COI_MARINE_20180221 --> version 20180221, uniq amplicon sequences restricted to marine organisms MIDORI_UNIQUE_SP_COI_GB249 --> version GB249 of uniq amplicon sequences MIDORI_LONGEST_SP_COI_GB249 --> version GB242 of longest species sequences MIDORI2_LONGEST_SP_COI_GB253 --> version GB253 of longest species sequences MIDORI2_UNIQ_SP_COI_GB253 --> version GB253 of uniq amplicon sequences COInr/ --> non redondant NCBI-nt and BOLD COI reference database from https://github.com/meglecz/mkCOInr COInr_2022_05_06 --> version published zenodo May 17, 2022 COI_Genbank/ --> personnal extraction from Genbank COI sequences COI_arthropod_personnal_052024 --> Arthropoda sequences (may 2024) #### MaarjAM/ --> reference database relatve to 18S or 28S amplicon genes of arbuscular mycorrhizal fungi (Glomeromycota) MaarjAM_18S_05-06-2019 --> MaarjAM version 05-06-2019 of 18S sequences MaarjAM_28S_25-05-2019 --> MaarjAM version 25-05-2019 of 28S sequences #### rbcL/ --> reference database relative to rbcL amplicons (see readme.txt) KBell_plant_rbcL_2021-07.tar.gz --> version 2021 rbcL_BOLD_Arcachon_20240516.tar.gz --> downloaded from BOLD the 20240516 filtered on plant filtered on French location rbcL_BOLD_Landes_20240516.tar.gz --> downloaded from BOLD the 20240516 filtered on plant filtered on French location #### matK/ --> reference database relative to matK amplicons (see readme.txt) matK_BOLD_Arcachon_20240516.tar.gz --> downloaded from BOLD the 20240516 filtered on plant filtered on French location matK_BOLD_Landes_20240516.tar.gz --> downloaded from BOLD the 20240516 filtered on plant filtered on French location #### microgreen-db/ --> reference database relative to 23S amplicons of photosynthetic eukaryotic algae and cyanobacteria associated with 3 different taxonomies microgreen-db_algae_v1.2.tar.gz --> version 1.2 with algae taxonomies (see readme.txt) microgreen-db_ncbi_v1.2.tar.gz --> version 1.2 with NCBI taxonomies (see readme.txt) microgreen-db_pr2-silva_v1.2.tar.gz --> version 1.2 with PR2/Silva taxonomies (see readme.txt) #### REFSeq/ --> data related to the REFSeq database (https://www.ncbi.nlm.nih.gov/refseq/targetedloci/) NCBIdb_archaea_16S_v1.20230726.tar.gz --> version 1 or Archaea sequences downloaded on 20230726 NCBIdb_bacteria_16S_v1.20230726.tar.gz --> version 1 or Bacteria sequences downloaded on 20230726 NCBIdb_archaea_16S_v2_20250204.tar.gz --> version 2 or Archaea sequences downloaded on 20250204 NCBIdb_bacteria_16S_v2_20250204.tar.gz --> version 2 or Bacteria sequences downloaded on 20250204 #### PSH_Laffon/ --> personal database of IST2 and rbcL amplicon sequences from plant species commonly found in apple orchards in the Lower Durance Valley ITS2_PSH_laffon_v1_20230731.tar.gz --> version 1 of ITS2 sequences created on 2023 07 31 rbcL_PSH_laffon_v1_20230731.tar.gz --> version 1 of rbcL sequences created on 2023 07 31 #### GTDB/ --> rRNA sequences extracted from the GTDB database (https://gtdb.ecogenomic.org/) 16S-ITS-23S-DB_GTDB_08-RS214.tar.gz --> long rRNA (16S-ITS-23S) extracted from genomes of GTDB v214 version GTDB_16S_release220_202411.tar.gz --> 16S rRNA extracted from all genomes of GTDB v220 version 16S-ITS-23S-GTDB_RS220_20250304.tar.gz -> long rRNA (16S-ITS-23S) extracted from genomes of GTDB v220 version #### 12S/ --> database relative to the 12S gene mitochondrial_12S_Tasmania_20240226.tar.gz --> personnal NCBI selection of ribosomal mitochondrial sequences from Tasmanian vertebrates 12S_vertebrate_personnal_042024.tar.gz --> personnal NCBI selection of 12S sequences from vertebrates (april 2024) 12S_vertebrata_personnal_052024.tar.gz --> personnal NCBI selection of 12S sequences from vertebrates (may 2024) MiFish_vertebrate_2021.9.tar.gz --> vertebrate selection of the MiFish 12S sequences #### trnl/ --> database relative to the trnl gene trnl_plant_personnal_042024.tar.gz --> personnal Genbank selection of plant trnl sequences (april 2024) trnl_spermatophyta_personnal_052024.tar.gz --> personnal Genbank selection of spermatophyta trnl sequences (may 2024) #### longread_rRNA/ --> database relative to rRNA amplicon sequence with longreads SSU-ITS-LSU_EUKARYOME_longread_v1.8_2024_07 --> EUKARYOME v1.8 database of SSU-ITS-LSU sequences #### 16S/ --> personnal or substracted databases related to 16S RNA gene 16S_WILDBEES_OCCITANIA_v202407 --> personnal 16S sequences of bees from the southwest of France (Occitania). #### CyanoSeq/ --> data related to the CyanoSeq database (https://zenodo.org/records/13910424) #### trnH/ --> database relative to the trnH gene --> database relative to the trnl gene -> personnal Genbank selection of plant trnH sequences (Feb 2025)