GREIN is an interactive web platform that provides user-friendly options to explore and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,000) of already processed datasets. These datasets were retrieved from GEO and reprocessed consistently by the back-end GEO RNA-seq experiments processing pipeline (GREP2).
Retrieve metadata for a given GEO series accession using Bioconductor package GEOquery.
Download the associated run files for each sample from SRA database using
ascp utility of aspera connect.
Generate FASTQ files from each SRA file using SRA Toolkit.
Get rid of the adapter sequences if necessary using Trimmomatic.
Quality control (QC) reports are generated for each of the FASTQ files using FastQC.
Run Salmon to quantify transcript abundances for each sample. These transcript level estimates are then summarized to gene level using tximport. We use
lengthScaledTPM option in the summarization step which gives estimated counts scaled up to library size while taking into account for transcript length. We obtained gene annotation for Homo sapiens (GRCh38), Mus musculus (GRCm38), and Rattus norvegicus (Rnor_6.0) from Ensemble (release-91).
Read more about GREIN at: https://doi.org/10.1101/326223