Mark Pallen
Publications
Latin binomials, popularised in the 18th century by the Swedish naturalist Linnaeus, have stood the test of time in providing a stable, clear, and memorable system of nomenclature across biology. However, relentless and ever-deeper exploration and analysis of the microbial world has created an urgent need for huge numbers of new names for Archaea and Bacteria. Manual creation of such names remains difficult and slow and typically relies on expert-driven nomenclatural quality control. Keen to ensure that the legacy of Linnaeus lives on in the age of microbial genomics and metagenomics, we propose an automated approach, employing combinatorial concatenation of roots from Latin and Greek to create linguistically correct names for genera and species that can be used off the shelf as needed. As proof of principle, we document over a million new names for Bacteria and Archaea. We are confident that our approach provides a road map for how to create new names for decades to come.
In recent years Bamako has been faced with an emerging threat from multidrug resistant TB (MDR-TB). Whole genome sequence analysis was performed on a subset of 76 isolates from a total of 208 isolates recovered from tuberculosis patients in Bamako, Mali between 2006 and 2012. Among the 76 patients, 61(80.3%) new cases and 15(19.7%) retreatment cases, 12 (16%) were infected by MDR-TB. The dominant lineage was the Euro-American lineage, Lineage 4. Within Lineage 4, the Cameroon genotype was the most prevalent genotype (n = 20, 26%), followed by the Ghana genotype (n = 16, 21%). A sub-clade of the Cameroon genotype, which emerged similar to 22 years ago was likely to be involved in community transmission. A sub-clade of the Ghana genotype that arose approximately 30 years ago was an important cause of MDR-TB in Bamako. The Ghana genotype isolates appeared more likely to be MDR than other genotypes after controlling for treatment history. We identified a clade of four related Beijing isolates that included one MDR-TB isolate. It is a major concern to find the Cameroon and Ghana genotypes involved in community transmission and MDR-TB respectively. The presence of the Beijing genotype in Bamako remains worrying, given its high transmissibility and virulence.
The COVID-19 pandemic continues to expand globally, with case numbers rising in many areas of the world, including the Eastern Mediterranean Region. Lebanon experienced its largest wave of COVID-19 infections from January to April 2021. Limited genomic surveillance was undertaken, with just 26 SARS-CoV-2 genomes available for this period, nine of which were from travellers from Lebanon detected by other countries. Additional genome sequencing is thus needed to allow surveillance of variants in circulation. In total, 905 SARS-CoV-2 genomes were sequenced using the ARTIC protocol. The genomes were derived from SARS-CoV-2-positive samples, selected retrospectively from the sentinel COVID-19 surveillance network, to capture diversity of location, sampling time, sex, nationality and age. Although 16 PANGO lineages were circulating in Lebanon in January 2021, by February there were just four, with the Alpha variant accounting for 97 % of samples. In the following 2 months, all samples contained the Alpha variant. However, this had changed dramatically by June and July 2021, when all samples belonged to the Delta variant. This study documents a ten-fold increase in the number of SARS-CoV-2 genomes available from Lebanon. The Alpha variant, first detected in the UK, rapidly swept through Lebanon, causing the country's largest wave to date, which peaked in January 2021. The Alpha variant was introduced to Lebanon multiple times despite travel restrictions, but the source of these introductions remains uncertain. The Delta variant was detected in Gambia in travellers from Lebanon in mid-May, suggesting community transmission in Lebanon several weeks before this variant was detected in the country. Prospective sequencing in June/July 2021 showed that the Delta variant had completely replaced the Alpha variant in under 6 weeks.
Increasing contact between humans and non-human primates provides an opportunity for the transfer of potential pathogens or antimicrobial resistance between host species. We have investigated genomic diversity and antimicrobial resistance in Escherichia coli isolates from four species of non-human primates in the Gambia: Papio papio (n=22), Chlorocebus sabaeus (n=14), Piliocolobus badius (n=6) and Erythrocebus patas (n=1). We performed Illumina whole-genome sequencing on 101 isolates from 43 stools, followed by nanopore long-read sequencing on 11 isolates. We identified 43 sequence types (STs) by the Achtman scheme (ten of which are novel), spanning five of the eight known phylogroups of E. coli. The majority of simian isolates belong to phylogroup B2 - characterized by strains that cause human extraintestinal infections - and encode factors associated with extraintestinal disease. A subset of the B2 strains (ST73, ST681 and ST127) carry the pks genomic island, which encodes colibactin, a genotoxin associated with colorectal cancer. We found little antimicrobial resistance and only one example of multi-drug resistance among the simian isolates. Hierarchical clustering showed that simian isolates from ST442 and ST349 are closely related to isolates recovered from human clinical cases (differences in 50 and 7 alleles, respectively), suggesting recent exchange between the two host species. Conversely, simian isolates from ST73, ST681 and ST127 were distinct from human isolates, while five simian isolates belong to unique core-genome ST complexes - indicating novel diversity specific to the primate niche. Our results are of planetary health importance, considering the increasing contact between humans and wild non-human primates.
A group convened and led by the Virus Evolution Working Group of the World Health Organization reports on its deliberations and announces a naming scheme that will enable clear communication about SARS-CoV-2 variants of interest and concern.
Little is known about the genomic diversity of Escherichia coli in healthy children from sub-Saharan Africa, even though this is pertinent to understanding bacterial evolution and ecology and their role in infection. We isolated and whole-genome sequenced up to five colonies of faecal E. coli from 66 asymptomatic children aged three-to-five years in rural Gambia (n = 88 isolates from 21 positive stools). We identified 56 genotypes, with an average of 2.7 genotypes per host. These were spread over 37 seven-allele sequence types and the E. coli phylogroups A, B1, B2, C, D, E, F and Escherichia cryptic clade I. Immigration events accounted for three-quarters of the diversity within our study population, while one-quarter of variants appeared to have arisen from within-host evolution. Several isolates encode putative virulence factors commonly found in Enteropathogenic and Enteroaggregative E. coli, and 53% of the isolates encode resistance to three or more classes of antimicrobials. Thus, resident E. coli in these children may constitute reservoirs of virulence- and resistance-associated genes. Moreover, several study strains were closely related to isolates that caused disease in humans or originated from livestock. Our results suggest that within-host evolution plays a minor role in the generation of diversity compared to independent immigration and the establishment of strains among our study population. Also, this study adds significantly to the number of commensal E. coli genomes, a group that has been traditionally underrepresented in the sequencing of this species.
Escherichia coli has a rich history as biology's 'rock star', driving advances across many fields. In the wild, E. coli resides innocuously in the gut of humans and animals but is also a versatile pathogen commonly associated with intestinal and extraintestinal infections and antimicrobial resistance-including large foodborne outbreaks such as the one that swept across Europe in 2011, killing 54 individuals and causing approximately 4000 infections and 900 cases of haemolytic uraemic syndrome. Given that most E. coli are harmless gut colonizers, an important ecological question plaguing microbiologists is what makes E. coli an occasionally devastating pathogen? To address this question requires an enhanced understanding of the ecology of the organism as a commensal. Here, we review how our knowledge of the ecology and within-host diversity of this organism in the vertebrate gut has progressed in the 137 years since E. coli was first described. We also review current approaches to the study of within-host bacterial diversity. In closing, we discuss some of the outstanding questions yet to be addressed and prospects for future research. This review presents an overview of E. coli diversity studies encompassing human and other nonhuman vertebrate hosts in the 137 years since this organism was first described and outstanding gaps in our knowledge.
Complex carbohydrates that escape small intestinal digestion, are broken down in the large intestine by enzymes encoded by the gut microbiome. This is a symbiotic relationship between microbes and host, resulting in metabolic products that influence host health and are exploited by other microbes. However, the role of carbohydrate structure in directing microbiota community composition and the succession of carbohydrate-degrading microbes, is not fully understood. In this study we evaluate species-level compositional variation within a single microbiome in response to six structurally distinct carbohydrates in a controlled model gut using hybrid metagenome assemblies. We identified 509 high-quality metagenome-assembled genomes (MAGs) belonging to ten bacterial classes and 28 bacterial families. Bacterial species identified as carrying genes encoding starch binding modules increased in abundance in response to starches. The use of hybrid metagenomics has allowed identification of several uncultured species with the functional potential to degrade starch substrates for future study. Longitudinal hybrid metagenomic analyses of a human stool sample reveal compositional and functional variation in response to six structurally-distinct carbohydrates, providing insight into how gut bacteria utilize various carbohydrate sources.
Chickens and guinea fowl are commonly reared in Gambian homes as affordable sources of protein. Using standard microbiological techniques, we obtained 68 caecal isolates of Escherichia coli from 10 chickens and 9 guinea fowl in rural Gambia. After Illumina wholegenome sequencing, 28 sequence types were detected in the isolates (4 of them novel), of which ST155 was the most common (22/68, 32 %). These strains span four of the eight main phylogroups of E. coli, with phylogroups B1 and A being most prevalent. Nearly a third of the isolates harboured at least one antimicrobial resistance gene, while most of the ST155 isolates (14/22, 64 %) encoded resistance to >= 3 classes of clinically relevant antibiotics, as well as putative virulence factors, suggesting pathogenic potential in humans. Furthermore, hierarchical clustering revealed that several Gambian poultry strains were closely related to isolates from humans. Although the ST155 lineage is common in poultry from Africa and South America, the Gambian ST155 isolates belong to a unique cgMLST cluster comprising closely related (38-39 alleles differences) isolates from poultry and livestock from subSaharan Africa - suggesting that strains can be exchanged between poultry and livestock in this setting. Continued surveillance of E. coli and other potential pathogens in rural backyard poultry from subSaharan Africa is warranted.
The status Candidatus was introduced to bacterial taxonomy in the 1990s to accommodate uncultured taxa defined by analyses of DNA sequences. Here I review the strengths, weaknesses, opportunities and threats (SWOT) associated with the status Candidatus in the light of a quarter century of use, twinned with recent developments in bacterial taxonomy and sequence-based taxonomic discovery. Despite ambiguities as to its scope, philosophical objections to its use and practical problems in implementation, the status Candidatus has now been applied to over 1000 taxa and has been widely adopted by journals and databases. Although lacking priority under the International Code for Nomenclature of Prokaryotes, many Candidatus names have already achieved de facto standing in the academic literature and in databases via description of a taxon in a peer-reviewed publication, alongside deposition of a genome sequence and there is a clear path to valid publication of such names on culture. Continued and increased use of Candidatus names provides an alternative to the potential upheaval that might accompany creation of a new additional code of nomenclature and provides a ready solution to the urgent challenge of naming many thousands of newly discovered but uncultured species.
We report the recovery of metagenome-assembled genomes (MAGs) from fecal samples collected in 2018 from five healthy adult female pigs in southeast England. The resulting nonredundant catalog of 192 MAGs encompasses 102 metagenomic species, 41 of them novel, spanning 10 bacterial and 2 archaeal phyla.
The human oesophagus is home to a complex microbial community, the oesophageal microbiome. Despite decades of work, we still have only a poor, low-resolution view of this community, which makes it hard to distinguish hope from hype when it comes to assessing links between the oesophageal microbiome and cancer. Here we review the potential importance of this microbiome and discuss new approaches, including culturomics, metagenomics, and recovery of whole-genome sequences, that bring renewed hope for an in-depth characterisation of this community that could deliver translational impact.
Background: The chicken is the most abundant food animal in the world. However, despite its importance, the chicken gut microbiome remains largely undefined. Here, we exploit culture-independent and culture-dependent approaches to reveal extensive taxonomic diversity within this complex microbial community. Results: We performed metagenomic sequencing of fifty chicken faecal samples from two breeds and analysed these, alongside all (n = 582) relevant publicly available chicken metagenomes, to cluster over 20 million non-redundant genes and to construct over 5,500 metagenome-assembled bacterial genomes. In addition, we recovered nearly 600 bacteriophage genomes. This represents the most comprehensive view of taxonomic diversity within the chicken gut microbiome to date, encompassing hundreds of novel candidate bacterial genera and species. To provide a stable, clear and memorable nomenclature for novel species, we devised a scalable combinatorial system for the creation of hundreds of well-formed Latin binomials. We cultured and genome-sequenced bacterial isolates from chicken faeces, documenting over forty novel species, together with three species from the genus Escherichia, including the newly named species Escherichia whittamii. Conclusions: Our metagenomic and culture-based analyses provide new insights into the bacterial, archaeal and bacteriophage components of the chicken gut microbiome. The resulting datasets expand the known diversity of the chicken gut microbiome and provide a key resource for future high-resolution taxonomic and functional studies on the chicken gut microbiome.
The vast majority of described prokaryotic viruses have double-stranded or single-stranded DNA or double-stranded RNA genomes. Until 2020, a mere four prokaryotic single-stranded, positive -sense RNA viruses have been classified in two genera (Riboviria; Lenarviricota; Allassoviricetes; Leviviridae). Several recent metagenomic and metatranscriptomic studies revealed a vastly greater diversity of these viruses in prokaryotic soil communities than ever anticipated. Phylogenetic analysis of these newly discovered viruses prompted the reorganization of class Allassoviricetes, now renamed Leviviricetes, to include two orders, Norzivirales and Timlovirales, and a total of six families, 428 genera and 882 species. Here we outline the new taxonomy of Leviviricetes, approved and ratified in 2021 by the International Committee on Taxonomy of Viruses, and describe open-access hidden Markov models to accommodate the anticipated identification and future classification of hundreds, if not thousands, of additional class members into this new taxonomic framework.
An amendment to this paper has been published and can be accessed via the original article.
Oren and Garrity recently published 42 new prokaryotic phylum names, including Bacillota , which they describe as a synonym of the effectively published name Firmacutes and its orthographic correction Firmicutes . However, the name Firmacutes was listed as a division in the Approved Lists of Bacterial Names, which suggests that it should be treated as having been validly published. Recent emendations to rules require that a named phylum now requires a named type genus and a phylum name is formed by the addition of the suffix -ota to the stem of the name of the designated type genus. However, there are strong practical arguments for retaining the name Firmicutes , notwithstanding the uncertainty over whether the name already has standing. This matter is referred to the Judicial Commission, asking for an opinion on the standing and retention of the name Firmicutes .
Thousands of new bacterial and archaeal species and higher-level taxa are discovered each year through the analysis of genomes and metagenomes. The Genome Taxonomy Database (GTDB) provides hierarchical sequence-based descriptions and classifications for new and as-yet-unnamed taxa. However, bacterial nomenclature, as currently configured, cannot keep up with the need for new well-formed names. Instead, microbiologists have been forced to use hard-to-remember alphanumeric placeholder labels. Here, we exploit an approach to the generation of well-formed arbitrary Latinate names at a scale sufficient to name tens of thousands of unnamed taxa within GTDB. These newly created names represent an important resource for the microbiology community, facilitating communication between bioinformaticians, microbiologists and taxonomists, while populating the emerging landscape of microbial taxonomic and functional discovery with accessible and memorable linguistic labels.
Among long-stay critically ill patients in the adult intensive care unit (ICU), there are often marked changes in the complexity of the gut microbiota. However, it remains unclear whether such patients might benefit from enhanced surveillance or from interventions targeting the gut microbiota or the pathogens therein. We therefore undertook a prospective observational study of 24 ICU patients, in which serial faecal samples were subjected to shotgun metagenomic sequencing, phylogenetic profiling and microbial genome analyses. Two-thirds of the patients experienced a marked drop in gut microbial diversity (to an inverse Simpson’s index of
Flagellin is the major constituent of the flagellar filament and faithful restoration of wild-type motility to flagellin mutants may be beneficial for studies of flagellar biology and biotechnological exploitation of the flagellar system. However, gene complementation studies often fail to report whether true wild-type motility was restored by expressing flagellin from a plasmid. Therefore, we explored the restoration of motility by flagellin expressed from a variety of combinations of promoter, plasmid copy number and induction strength. Motility was only partially (similar to 50%) restored using the tightly regulated rhamnose promoter due to weak flagellin gene expression, but wild-type motility was regained with the T5 promoter, which, although leaky, allowed titration of induction strength. The endogenous E. coli flagellin promoter also restored wild-type motility. However, flagellin gene transcription levels increased 3.1- 27.9-fold when wild-type motility was restored, indicating disturbances in the flagellar regulatory mechanisms. Motility was little affected by plasmid copy number when dependent on inducible promoters. However, plasmid copy number was important when expression was controlled by the native E. coli flagellin promoter. Motility was poorly correlated with flagellin transcription levels, but strongly correlated with the amount of flagellin associated with the flagellar filament, suggesting that excess monomers are either not exported or not assembled into filaments. This study provides a useful reference for further studies of flagellar function and a simple blueprint for similar studies with other proteins.
Background Gene doctoring is an efficient recombination-based genetic engineering approach to mutagenesis of the bacterial chromosome that combines the lambda-Red recombination system with a suicide donor plasmid that is cleaved in vivo to generate linear DNA fragments suitable for recombination. The use of a suicide donor plasmid makes Gene Doctoring more efficient than other recombineering technologies. However, generation of donor plasmids typically requires multiple cloning and screening steps. Results We constructed a simplified acceptor plasmid, called pDOC-GG, for the assembly of multiple DNA fragments precisely and simultaneously to form a donor plasmid using Golden Gate assembly. Successful constructs can easily be identified through blue-white screening. We demonstrated proof of principle by inserting a gene for green fluorescent protein into the chromosome ofEscherichia coli. We also provided related genetic parts to assist in the construction of mutagenesis cassettes with a tetracycline-selectable marker. Conclusions Our plasmid greatly simplifies the construction of Gene Doctoring donor plasmids and allows for the assembly of complex, multi-part insertion or deletion cassettes with a free choice of target sites and selection markers. The tools we developed are applicable to gene editing for a wide variety of purposes inEnterobacteriaceaeand potentially in other diverse bacterial families.
Brachyspira hyodysenteriae is the principal cause of swine dysentery, a disease that threatens economic productivity of pigs in many countries as it can spread readily within and between farms, and only a small number of antimicrobials are authorized for treatment of pigs. In this study, we performed whole-genome sequencing (WGS) of 81 B. hyodysenteriae archived at the Animal and Plant Health Agency (APHA) from diagnostic submissions and herd monitoring in England and Wales between 2004 and 2015. The resulting genome sequences were analyzed alongside 34 genomes we previously published. Multi-locus sequence typing (MLST) showed a diverse population with 32 sequence types (STs) among the 115 APHA isolates, 25 of them identified only in England; while also confirming that the dominant European clonal complexes, CC8 and CC52, were common in the United Kingdom. A core-genome SNP tree typically clustered the isolates by ST, with isolates from some STs detected only within a specific region in England, although others were more widespread, suggesting transmission between different regions. Also, some STs were more conserved in their core genome than others, despite these isolates being from different holdings, regions and years. Minimum inhibitory concentrations to commonly used antimicrobials (Tiamulin, Valnemulin, Doxycycline, Lincomycin, Tylosin, Tylvalosin) were determined for 82 of the genome-sequenced isolates; genomic analysis revealed mutations generally correlated well with the corresponding resistance phenotype. There was a major swine dysentery intervention program in 2009–2010, and antimicrobial survival curves showed a significant reduction in sensitivity to tiamulin and valnemulin in isolates collected in and after 2010, compared to earlier isolates. This correlated with a significant increase in post-2009 isolates harboring the pleuromutilin resistance gene tva(A) , which if present, may facilitate higher levels of resistance. The reduction in susceptibility of Brachyspira from diagnostic submissions to pleuromutilins, emphasizes the need for prudent treatment, control and eradication strategies.
The remarkable success of taxonomic discovery, powered by culturomics, genomics and metagenomics, creates a pressing need for new bacterial names while holding a mirror up to the slow pace of change in bacterial nomenclature. Here, I take a fresh look at bacterial nomenclature, exploring how we might create a system fit for the age of genomics, playing to the strengths of current practice while minimizing difficulties. Adoption of linguistic pragmatism—obeying the rules while treating recommendations as merely optional—will make it easier to create names derived from descriptions, from people or places or even arbitrarily. Simpler protologues and a relaxed approach to recommendations will also remove much of the need for expert linguistic quality control. Automated computer-based approaches will allow names to be created en masse before they are needed while also relieving microbiologists of the need for competence in Latin. The result will be a system that is accessible, inclusive and digital, while also fully capable of naming the unnamed millions of bacteria. •There is a need for new names, but most microbiologists lack competence in Latin.•Digital approaches to creation of names will minimize the need for human expertise.•We can easily create descriptive and arbitrary Latinized names en masse.•Linguistic pragmatism will make bacterial nomenclature accessible and inclusive.•Simpler protologues will make publishing names easier and less intimidating.
Background The horse plays crucial roles across the globe, including in horseracing, as a working and companion animal and as a food animal. The horse hindgut microbiome makes a key contribution in turning a high fibre diet into body mass and horsepower. However, despite its importance, the horse hindgut microbiome remains largely undefined. Here, we applied culture-independent shotgun metagenomics to thoroughbred equine faecal samples to deliver novel insights into this complex microbial community. Results We performed metagenomic sequencing on five equine faecal samples to construct 123 high- or medium-quality metagenome-assembled genomes from Bacteria and Archaea. In addition, we recovered nearly 200 bacteriophage genomes. We document surprising taxonomic diversity, encompassing dozens of novel or unnamed bacterial genera and species, to which we have assigned new Candidatus names. Many of these genera are conserved across a range of mammalian gut microbiomes. Conclusions Our metagenomic analyses provide new insights into the bacterial, archaeal and bacteriophage components of the horse gut microbiome. The resulting datasets provide a key resource for future high-resolution taxonomic and functional studies on the equine gut microbiome.