The massive research effort known as the Human Genome Project is an attempt to record the sequence of the three trillion nucleotides that make up the human genome and to identify individual genes within this sequence. While the basic effort is of course a biological one, the description and classification of sequences also lend themselves naturally to mathematical and statistical modeling. This short textbook on the mathematics of genome analysis presents a brief description of several ways in which mathematics and statistics are being used in genome analysis and sequencing. It will be of interest not only to students but also to professional mathematicians curious about the subject.
Genomics and bioinformatics play an increasingly important and transformative role in medicine, society and agriculture. The mapping of the human genome has revealed 35,000 or so genes which might cod
High-throughput sequencing has revolutionised the field of biological sequence analysis. Its application has enabled researchers to address important biological questions, often for the first time. This book provides an integrated presentation of the fundamental algorithms and data structures that power modern sequence analysis workflows. The topics covered range from the foundations of biological sequence analysis (alignments and hidden Markov models), to classical index structures (k-mer indexes, suffix arrays and suffix trees), Burrows–Wheeler indexes, graph algorithms and a number of advanced omics applications. The chapters feature numerous examples, algorithm visualisations, exercises and problems, each chosen to reflect the steps of large-scale sequencing projects, including read alignment, variant calling, haplotyping, fragment assembly, alignment-free genome comparison, transcript prediction and analysis of metagenomic samples. Each biological problem is accompanied by precise
Following the announcement of the draft sequence of the human genome and the completion of many others, attention is now increasingly turning to the analysis of the proteins encoded by genomes - prote
With the arrival of genomics and genome sequencing projects, biology has been transformed into an incredibly data-rich science. The vast amount of information generated has made computational analysis critical and has increased demand for skilled bioinformaticians. Designed for biologists without previous programming experience, this textbook provides a hands-on introduction to Unix, Perl and other tools used in sequence bioinformatics. Relevant biological topics are used throughout the book and are combined with practical bioinformatics examples, leading students through the process from biological problem to computational solution. All of the Perl scripts, sequence and database files used in the book are available for download at the accompanying website, allowing the reader to easily follow each example using their own computer. Programming examples are kept at an introductory level, avoiding complex mathematics that students often find daunting. The book demonstrates that even simp
With the arrival of genomics and genome sequencing projects, biology has been transformed into an incredibly data-rich science. The vast amount of information generated has made computational analysis critical and has increased demand for skilled bioinformaticians. Designed for biologists without previous programming experience, this textbook provides a hands-on introduction to Unix, Perl and other tools used in sequence bioinformatics. Relevant biological topics are used throughout the book and are combined with practical bioinformatics examples, leading students through the process from biological problem to computational solution. All of the Perl scripts, sequence and database files used in the book are available for download at the accompanying website, allowing the reader to easily follow each example using their own computer. Programming examples are kept at an introductory level, avoiding complex mathematics that students often find daunting. The book demonstrates that even simp