Data preprocessing with TALON
Download reference genome and annotation
mkdir data
mkdir figures
cd data/
# download reference files
wget https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
gunzip GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz
awk '{print $1}' GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta > hg38.fa
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz
gunzip gencode.v29.annotation.gtf.gzDownload mapped PacBio RNA-seq data for each replicate
Convert each bam to sam
Label reads with TALON for detection of internal priming
Initialize TALON database with GENCODE v29 annotation
Create config file to run TALON
Run TALON
Filter annotated transcripts for internal priming and reproducibility for each cell line
Create a GTF of annotated transcripts for all observed transcripts that passed filtering
Obtain a filtered abundance file of each transcript that passed filtering
Last updated