Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes

Guo, Jiarong and Quensen, John F. and Sun, Yanni and Wang, Qiong and Brown, C. Titus and Cole, James R. and Tiedje, James M. (2019) Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes. Frontiers in Genetics, 10. ISSN 1664-8021

[thumbnail of pubmed-zip/versions/1/package-entries/fgene-10-00957.pdf] Text
pubmed-zip/versions/1/package-entries/fgene-10-00957.pdf - Published Version

Download (1MB)

Abstract

Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.

Item Type: Article
Subjects: East Asian Archive > Medical Science
Depositing User: Unnamed user with email support@eastasianarchive.com
Date Deposited: 14 Feb 2023 10:42
Last Modified: 09 Jul 2024 08:18
URI: http://library.eprintdigipress.com/id/eprint/174

Actions (login required)

View Item
View Item