New Tool Enhances the Search for Genetic Mutations

In search for probability of de novo mutations

While the majority of mutations in human genome have neutral effect on health, some of them are associated with diseases and disease susceptibility. Reed Cartwright, a researcher at Arizona State University's Biodesign Institute, along with colleagues at his University, Washington University, and the Wellcome Trust Sanger Institute, Cambridge, UK, report in the journalNature Methodson a new software tool known as DeNovoGear, which uses statistical probabilities to help identify mutations and more accurately pinpoint their source and their possible significance for health. Indeed, the method described provides the first model-based approach for ferreting out certain types of mutations.

Improvements in the accuracy of mutation identification and validation could have a profound impact on the diagnosis and treatment of mutation-related diseases. One of the primary goals in genetics is to accurately characterize genetic variation and the rate at which it occurs. Searching for DNA mutations through genetic sequencing is an important ingredient in this quest, but many challenges exist. The current study focuses on a class of mutations that play a critical role in human disease, namely de novo mutations, which arise spontaneously and are not derived from the genomes of either parent.

Traditionally, two approaches for identifying de novo mutation rates in humans have been applied, each involving estimates of average mutations over multiple generations. In the first, such rates are measured directly through an estimation of the number of mutations occurring over a known number of generations. In the second or indirect method, mutation rates are inferred by estimating levels of genetic variation within or between species.

In the new study, a novel approach is used. The strategy, pioneered in part by Donald Conrad, professor in the Department of Genetics at Washington University School of Medicine and corresponding author of the current study, takes advantage of high throughput genetic sequencing to examine whole genome data in search of de novo mutations.

New tool enhances the search for genetic mutations

Top left: sequencing results for mother, father and daughter are examined, based on multiple reads of a given site in the genome. Each parent is judged to be homozygous -- carrying two copies of the same base, (AA). Top right: Sequencing of the daughter identifies a heterozygous state at the same site (AC) indicating a de novo mutation, though the new software indicates a very low probability for a mutation at this site, suggesting a false positive result. Bottom (left & right): show a failure to identify a heterozygous site in the child, which results in a false negative (with the new method indicating a very high likelihood of mutation at this site)
Credit: USA National Center for Biotechnology Information

The mutations under study may take the form of either point mutations—individual nucleotide substitutions, or so-called indel (insertion-deletion) mutations. In the latter case, single nucleotides or nucleotide sequences may be either added or subtracted from the genome.

While point mutations and indel mutations can both have adverse affects on health, indels are significantly more difficult to identify and verify. They have a strong potential to cause havoc when they occur in coding portions of the genome as the addition or deletion of nucleotides can disrupt the translation process needed to accurately assemble proteins. The current study is the first paper to use model-based approaches to detect indel mutations.

A seemingly simple approach to pinpointing mutations is to compare sequence data from each parent with sequence data from their offspring. Where changes exist at a given site in the offspring, de novo mutations can be inferred and their potential affect on human health, assessed.

In reality, such efforts are complicated by a number of potential sources of error, including insufficient sampling of the genome, mistakes in the gene sequencing process and errors of alignment between sequences. The new method uses a probabilistic algorithm to evaluate the likelihood of mutation at each site in the genome, comparing it with actual sequence data.

Human cells contain two copies of the genome—one from each parent. For most positions in the genome, the bases from each parent are the same or homozygous but occasionally, they are different or heterozygous. Errors derived from conventional methods can take the form of false negatives, particularly when gene sequencing misses heterozygous sites in the genotype of the child. On the other hand, failure to identify a heterozygous site in one of the parents can lead to a false positive result.

In the current study, data from the 1000 genomes project was analyzed using DeNovoGear, with markedly improved accuracy. The technique will assist ongoing efforts to better understand which mutations contribute to sporadic disease or cancer in individuals, the distribution of mutations and their characteristics across populations.

The power of this technique comes from its probabilistic model which calculates the probability of a de novo mutation at a site based on estimations of mutation rates, sequencing error rate, and the initial genetic variation in the population from which the parents arise. This model is able to consider multiple explainations for experimental observations and decide between them. The probabilities are used to indentify candidate loci which are then evaluated using target resequncing.

Given adequate data of genetic pedigree, the method is able to distinguish germline from somatic mutations in an automated manner with high accuracy. The researchers aim is to develop software that will allow researchers and clinicians to estimate a range of mutation types, faster, more accurately, and cheaper.

In addition to further refining the DeNovoGear software, Cartwright's group plans to more closely examine normal human tissue in order to establish rates of somatic mutation. Some of the specific mutations currently associated with cancer for example, may actually be part of normal variability, which appears to be much greater than originally assumed.

The news describing the article published in the Nature Methods is written by Richard Harth from the Biodesign Institute.

Reference:

Ramu A, Noordam MJ, Schwarty RS,et al.  DeNovoGear: de novo indel and point mutation discovery and phasing. Nat Methods 2013 Aug 25. doi: 10.1038/nmeth.2611. [Epub ahead of print]