RNA Sequencing Analysis
RNA sequencing (RNA-seq) is a technique used to analyze the transcriptome - the complete set of RNA transcripts in a cell. This method involves sequencing RNA molecules after reverse transcription to cDNA to examine gene expression levels and identify novel transcripts.
This document describes two pipelines for RNA-seq analysis:
- Using
featureCounts
and then analyzing with DESeq2 in R - Using the Tuxedo Suite (HISAT2, StringTie, Ballgown)
featureCounts
featureCounts
is a program for counting reads mapped to genomic features, such as genes, exons,
and promoters. It is part of the Subread
package and can be used for RNA-seq as well as DNA-seq analysis.
featureCounts takes as input SAM/BAM files and an annotation file including chromosomal coordinates of features. It outputs numbers of reads assigned to features (or meta-features). It also outputs stat info for the overall summrization results, including number of successfully assigned reads and number of reads that failed to be assigned due to various reasons (these reasons are included in the stat info).
Use FeatureCounts to count the number of reads that map to each gene in a GTF file and summarize the results for downstream analysis (i.e., differential expression).
Input
Aligned reads in BAM format and a GTF file containing genomic features.
Example usage:
featureCounts -T "$NUM_THREADS" --verbose -t exon -g gene_id --countReadPairs \
-a "$REF_GTF" -p -P -C -B -o "${OUT_DIR}/${OUT_PREFIX}.tsv" ./*.bam
Output
Feature counts are written to a tab-delimited file (.tsv
or .txt
)
with columns for each sample. You can import this file into R or other
statistical software for further analysis.
Feature counts also provide a summary of the number of reads that were assigned to features, as well as the number of reads that were not assigned to any feature.
Downstream Analysis using R
Export the results from featureCounts
to R for further analysis, such as
differential expression analysis using packages like DESeq2 or edgeR.
DESeq2
DESeq2 is an R package for differential gene expression analysis based on the negative binomial distribution.
DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models. The models use the raw counts as input and perform regularized log transformation and variance stabilizing transformation. The package also provides functions to visualize the data and results.
Tuxedo Suite
The Tuxedo Suite is a collection of tools for transcript-level expression analysis of RNA-seq experiments.
- Align reads to the reference genome and sort to BAM format (HISAT2)
- Assemble transcripts (StringTie)
- Prepare for differential expression analysis (Ballgown setup)
- Perform differential expression analysis
- Visualize results
Tool | Description | Manual | Source |
---|---|---|---|
HISAT2 | Align reads to reference genome | manual | source |
StringTie | Assemble RNA-Seq alignments into transcripts | manual | source |
Ballgown | Isoform-level differential expression analysis | manual | source |
Tip
StringTie comes packaged with gffcompare
for comparing and evaluating the
accuracy of RNA-seq transcript assemblers. Read the
manual for details.
Using Ballgown:
- Filter to remove low-abundance genes
- Identify differentially expressed transcripts and genes
- Add gene names
- Sort by p-value
- Write results to file
References
- Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923-30. PMID: 24227677
- Pertea, M., Kim, D., Pertea, G. M., Leek, J. T., & Salzberg, S. L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nature Protocols, 11(9), 1650–1667. PMID: 27560171
- Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357-360. PMID: 25751142
- Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290-295. PMID: 25690850
- Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol. 2015;33(3):243-246. PMID: 25748911
Page last updated on 2024-10-17.