Data preprocessing with TALON

For the Swan manuscript, these are the data processing steps we took to process our data before using Swan. Obtaining a transcriptome via TALON is not necessary and you can use other tools that yield transcriptomes as input to Swan.

If you don't want to process the raw data from square one and want to get started using Swan please see the Getting started section.

You can download and view documentation for TALON here.

Each of the below steps should be run in your Bash terminal

Download reference genome and annotation

mkdir data
mkdir figures
cd data/

# download reference files
wget https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
gunzip GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
awk '{print $1}' GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta > hg38.fa

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz
gunzip gencode.v29.annotation.gtf.gz

Download mapped PacBio RNA-seq data for each replicate

Convert each bam to sam

Label reads with TALON for detection of internal priming

Initialize TALON database with GENCODE v29 annotation

Create config file to run TALON

Run TALON

Filter annotated transcripts for internal priming and reproducibility for each cell line

Create a GTF of annotated transcripts for all observed transcripts that passed filtering

Obtain a filtered abundance file of each transcript that passed filtering

Last updated

Was this helpful?