The last few years have seen a tremendous advancement in molecular genetics. The advanced methodologies have enabled better quality control, and the high throughput processes have allowed the analysis and interpretation of high volumes of data. Whole exome sequencing (WES) has made it possible to discover new disease-gene links. However, since the procedure is only limited to the exome, between 50% to 75% of the patients go without a genetic diagnosis of their genetic disorders and diseases. According to recent research data, over 3000 Mendelian disorders still await genetic definition.
Variant calling and prioritization: why are WES and WGS not enough?
WES barely covers 2% of the human genome since it focuses on the protein-encoding exons of the genome only. Today, it is well-known that the intronic sequences that do not encode proteins may have the disease-causing mutations and variants in the genome.
That brings us to Whole Genome Sequencing (WGS). It takes into consideration the entire genetic makeup of an individual – the exons as well as the introns. It shows the potential to analyze all disease-causing variants in the genome – indels, copy number variants, and copy number variations. WGS can also capture the variations in the regulatory regions of every gene and the other complex rearrangements within the non-coding regions, including inversion mutations and transposons.
However, the sheer number of variations – 3 million, make variant prioritization challenging for the researchers. In the case of WES, there was a lack of enough data for deciphering the variants present within the genome, but with WGS, the challenge shifts to optimized analytics that can efficiently interpret the function and clinical effects of the data. It is a “needle in a haystack” problem that makes it tricky for the researchers to determine which variants are capable of causing the disease phenotypes or damage gene function.
Why is RNA-seq essential for genetic diagnosis of inherited disorders and diseases?
It exclusively focuses on the entire set of transcripts in a cell
Focusing on the mRNA or the actively transcribed RNA that serves as the template for the protein-encoding mechanism inside the cell, can give the researchers a thorough understanding of the link between the essentially identical DNA in the cells and the genetics.
RNA-Seq uses next-generation sequencing (NGS) sequencing to reveal the constantly altering cellular transcriptome. The data from RNA-seq is dynamic, and it can capture alternative gene spliced transcripts, mutations/SNPs, post-transcriptional modifications, and alterations in gene expression over a period. The methods of NGS and RNA seq data analysis have been developed and standardized over the last one decade.
i. Demystification of existing Mendelian inherited diseases and contribute to the development of personalized medicine
RNA-Seq has the ability to diagnose Mendelian inherited diseases, identify the profile biomarkers, diagnose new genetic diseases, and profile new drug pathways within the cells. It can determine the gene expression profile of a given sample accurately and quickly. The most significant application of RNA-seq is personalized medicine. It can be applied for the development of personalized medication for different subgroups and individual patients. The results of the sequencing and analytics can potentially determine new drug pathways and therapeutic opportunity for individuals depending on the cost of the process.
ii. Highlight cancer progression and show new drug pathways
The Cancer Genome Atlas (TCGA) and the encyclopedia of DNA elements (ENCODE) have utilized various techniques of RNA seq data analysisfor extensive periods in due course of their project. They have used standardized methods to characterize numerous cancer cell lines and primary tumor cells from donors. The aim of TCGA was to collect and analyze data from over 30 different types of tumors from patients to unravel the mechanism of malignancy and cancer progression. On the other hand, ENCODE aimed to determine the gene regulatory regions within the entire genome from different cohorts. That would give the researchers an understanding of the genetic and epigenetic regulatory layers. RNA-Seq can identify the single nucleotide polymorphisms (SNPs), fusions of genes and allele-specific expressions that potentially play a role in the development of cancer.
iii. Unfurling of the intron variations that contribute to genetic diseases
RNA seq data analysis allows the researchers to identify the underlying genetic cause of diseases beyond exomes. Diseases caused by upregulation or downregulation of specific protein expression due to mutations or variations in the regulatory regions of the genes can also be explored via NGS RNA-seq. RNA seq data analysis can determine alternative splicing events in a cell giving rise to altered protein expression. Moreover, RNA-Seq can explore different types of RNA, including miRNA, ribosomal RNA, and tRNA. It enables the scientists and medical professionals to identify the potential causes of a disorder that lies within the messenger RNA or RNA transcripts of a sample.
What is the need for advanced analytics for interpreting RNA seq data?
There was once a time when the interpretation of RNA-Seq data took one whole day or longer. However, the presence of ready-to-use automated pipelines and reports in their ready-to-publish format has reduced the time necessary to interpret RNA-seq data to an hour. Research teams leverage cloud-hosted secure software to create between 1 and 1000 parallel workflows for the RNA seq data analysis of humongous amounts of data from their experiments to reach replicable results within one hour. The presence of automated software workflows that capture and analyze RNA-seq data allows the simplification of the most complex analytics processes.
The massive variation in the different types of RNA and their analysis of the data arising from their sequencing demands reliable tools for analysis that produces replicable results. The availability of high-speed analytics techniques in a cloud-based format allows the researchers to accurately analyze and interpret the results from the RNA sequencing data.