🦢
Swan
  • Swan
  • Tutorials
    • Data preprocessing with TALON
    • Getting started
    • Analysis
    • Visualization
    • Scanpy compatibility
  • FAQs
    • Understanding Swan visualizations
    • Additional utilities
    • SwanGraph data structure
    • File format specifications
  • Code documentation
    • swan_vis.SwanGraph()
    • swan_vis.utils
Powered by GitBook
On this page
  • Download example data
  • Starting up Swan and initializing your SwanGraph
  • Adding a reference transcriptome
  • Adding transcript models from a GTF
  • Adding abundance information
  • Adding abundance from a TSV
  • Adding abundance from an AnnData
  • Adding gene-level abundance
  • Saving and loading your SwanGraph
  • Adding transcript models from a TALON DB
  • Adding metadata
  • Behavior with Cerberus

Was this helpful?

  1. Tutorials

Getting started

PreviousData preprocessing with TALONNextAnalysis

Last updated 1 year ago

Was this helpful?

First, if you haven't already, make sure to . After installing, you'll be able to run Swan from Python.

Then, download the data and the reference transcriptome annotation from . The bash commands to do so are given below.

The main workflow to get started with Swan consists of:

  1. Adding a transcriptome for your samples

Other sections:

This page can also be read from top to bottom, just know that you may be running things more than once!

Download example data

Run this block in your bash terminal

mkdir figures

# download files
wget https://zenodo.org/record/8118614/files/data.tgz

# expand files
tar -xzf data.tgz

Starting up Swan and initializing your SwanGraph

The rest of the code in this tutorial should be run in using Python

Initialize an empty SwanGraph and add the transcriptome annotation to the SwanGraph.

import swan_vis as swan

# initialize a new SwanGraph
sg = swan.SwanGraph()

Note: to initialize a SwanGraph in single-cell mode (which will avoid calculating percent isoform use [pi] numbers for each cell), use the following code:

sg = swan.SwanGraph(sc=True)
annot_gtf = 'data/gencode.v29.annotation.gtf'
data_gtf = 'data/all_talon_observedOnly.gtf'
ab_file = 'data/all_talon_abundance_filtered.tsv'
talon_db = 'data/talon.db'
adata_file = 'data/swan_anndata.h5ad'
pass_list = 'data/all_pass_list.csv'
meta = 'data/metadata.tsv'

Adding a reference transcriptome

# add an annotation transcriptome
sg.add_annotation(annot_gtf)
Adding annotation to the SwanGraph

Adding transcript models from a GTF

Add all filtered transcript models to the SwanGraph.

# add a dataset's transcriptome to the SwanGraph
sg.add_transcriptome(data_gtf)
Adding transcriptome to the SwanGraph

Adding abundance information

Adding abundance from a TSV

# add each dataset's abundance information to the SwanGraph
sg.add_abundance(ab_file)
Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.


/Users/fairliereese/miniconda3/lib/python3.7/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)

Adding abundance from an AnnData

If you have abundance information and metadata information in AnnData format, you can use this as direct input into Swan. This will help circumvent the dense matrix representation of the TSV in the case of very large datasets or single-cell data.

# add abundance for each dataset from the AnnData into the SwanGraph
sg = swan.SwanGraph()
sg.add_annotation(annot_gtf)
sg.add_transcriptome(data_gtf)
sg.add_adata(adata_file)
Adding annotation to the SwanGraph

Adding transcriptome to the SwanGraph

Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
Calculating TPM...
Calculating PI...
Calculating edge usage...


/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:828: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  adata = anndata.AnnData(var=var, obs=obs, X=X)
/Users/fairliereese/miniconda3/envs/scanpy_2/lib/python3.7/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)


Calculating TSS usage...


/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:759: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  adata = anndata.AnnData(var=var, obs=obs, X=X)


Calculating TES usage...

Adding gene-level abundance

# add gene-level abundance to the SwanGraph
sg.add_abundance(ab_file, how='gene')
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:363: FutureWarning: X.dtype being converted to np.float32 from int64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  adata = anndata.AnnData(var=var, obs=obs, X=X)



Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
Calculating TPM...

Saving and loading your SwanGraph

Following this, you can save your SwanGraph so you can easily work with it again without re-adding all the data.

# save the SwanGraph as a Python pickle file
sg.save_graph('data/swan')
Saving graph as data/swan.p

And you can reload the graph again.

# load up a saved SwanGraph from a pickle file
sg = swan.read('data/swan.p')
Read in graph from data/swan.p

Adding transcript models from a TALON DB

# for this new example, create a new empty SwanGraph
sg = swan.SwanGraph()

# and add the annotation transcriptome to it
sg.add_annotation(annot_gtf)

# add transcriptome from TALON db
sg.add_transcriptome(talon_db, pass_list=pass_list)

# add each dataset's abundance information to the SwanGraph
sg.add_abundance(ab_file)
Adding annotation to the SwanGraph

Adding transcriptome to the SwanGraph


/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:346: FutureWarning: X.dtype being converted to np.float32 from int64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
  adata = anndata.AnnData(var=var, obs=obs, X=X)



Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
Calculating TPM...
Calculating PI...
Calculating edge usage...


/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:810: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.

/Users/fairliereese/miniconda3/envs/scanpy_2/lib/python3.7/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
  warnings.warn("Transforming to str index.", ImplicitModificationWarning)


Calculating TSS usage...


/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:741: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.



Calculating TES usage...

Adding metadata

Swan provides functionality to perform tests and plotting on the basis of metadata categories. Add metadata by calling the SwanGraph.add_metadata() function, or use the SwanGraph.add_adata() function to add both expression information and metadata at the same time.

sg = swan.read('data/swan.p')
sg.add_metadata(meta)
Read in graph from data/swan.p


/Users/fairliereese/miniconda3/envs/scanpy_2/lib/python3.7/site-packages/anndata/_core/anndata.py:798: UserWarning:
AnnData expects .obs.index to contain strings, but got values like:
    [0, 1, 2, 3, 4]

    Inferred to be: integer

  value_idx = self._prep_dim_index(value.index, attr)
sg.adata.obs
cell_line
replicate
dataset
total_counts
description

index

hepg2_1

hepg2

1

hepg2_1

499647.0

liver

hepg2_2

hepg2

2

hepg2_2

848447.0

liver

hffc6_1

hffc6

1

hffc6_1

761493.0

fibroblast

hffc6_2

hffc6

2

hffc6_2

787967.0

fibroblast

hffc6_3

hffc6

3

hffc6_3

614921.0

fibroblast

Behavior with Cerberus

  • Swan will use the TSS / TES assignments as dictated by Cerberus to define unique entries in SwanGraph.tss_adata and SwanGraph.tes_adata. For instance, if the same vertex is used in more than one gene, they will still be treated as separate vertices in the TSS / TES AnnDatas.

  • Swan will automatically pull intron chain information from the transcript triplet in Cerberus and use it to generate an AnnData tracking the expression of intron chains separately from the transcripts they come from in SwanGraph.ic_adata. This can also be used to perform isoform switching tests.

  • Currently, Swan does not parse Cerberus novelty categories. We are hoping to support this in a future release.

sg = swan.read('data/swan_modelad.p')
sg.ic_adata.var.tail()
Read in graph from data/swan_modelad.p
gid
gname
ic_name
n_cells

ic_id

ENSMUSG00000118369_2

ENSMUSG00000118369

Gm30541

Gm30541_2

14

ENSMUSG00000118380_3

ENSMUSG00000118380

Gm36037

Gm36037_3

1

ENSMUSG00000118382_1

ENSMUSG00000118382

Gm8373

Gm8373_1

2

ENSMUSG00000118383_1

ENSMUSG00000118383

Gm50321

Gm50321_1

14

ENSMUSG00000118390_1

ENSMUSG00000118390

Gm50102

Gm50102_1

1

For information on the file formats needed to use Swan, please read the .

This data is the data used in the

You can use an abundance matrix with columns for each desired dataset to add datasets to the SwanGraph. The file format is specified .

By adding abundance information from either an AnnData or TSV file, Swan will also automatically calculate the counts and TPM for each TSS, TES, and intron or exon. If you had previously used add_transcriptome() to add a GTF that was generated by or uses Cerberus-style transcript IDs (ie. <gene_id>[1,1,1]), Swan will also calculate intron chain counts and TPM automatically.

You can also store gene expression in the SwanGraph. This can either be done from a TALON abundance TSV that contains transcript-level counts where the counts for each transcript will be summed up across the gene. Alternatively, supply this function a gene-level counts matrix where the first column is the gene ID rather than the transcript ID, but otherwise follows the .

Swan is also directly compatible with TALON databases and can pull transcript models directly from them. You can also optionally pass in a list of isoforms from to filter your input transcript models.

When you use a GTF in SwanGraph.add_annotation() or SwanGraph.add_transcriptome(), keep in mind the following:

file format specifications FAQ
Swan publication
here
Cerberus
input abundance TSV format
talon_filter_transcripts
Cerberus
install Swan
here
Adding a reference transcriptome
From a GTF
From a TALON db
Adding abundance information
From a TSV
From an AnnData
On the gene level
Adding metadata to your datasets
Example data download
Starting and initializing your SwanGraph
Saving and loading your SwanGraph
Behavior with Cerberus