The rest of the code in this tutorial should be run in using Python
Initialize an empty SwanGraph and add the transcriptome annotation to the SwanGraph.
import swan_vis as swan# initialize a new SwanGraphsg = swan.SwanGraph()
Note: to initialize a SwanGraph in single-cell mode (which will avoid calculating percent isoform use [pi] numbers for each cell), use the following code:
# add an annotation transcriptomesg.add_annotation(annot_gtf)
Adding annotation to the SwanGraph
Adding transcript models from a GTF
Add all filtered transcript models to the SwanGraph.
# add a dataset's transcriptome to the SwanGraphsg.add_transcriptome(data_gtf)
Adding transcriptome to the SwanGraph
Adding abundance information
Adding abundance from a TSV
You can use an abundance matrix with columns for each desired dataset to add datasets to the SwanGraph. The file format is specified here.
# add each dataset's abundance information to the SwanGraphsg.add_abundance(ab_file)
Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
/Users/fairliereese/miniconda3/lib/python3.7/site-packages/anndata/_core/anndata.py:120: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
Adding abundance from an AnnData
If you have abundance information and metadata information in AnnData format, you can use this as direct input into Swan. This will help circumvent the dense matrix representation of the TSV in the case of very large datasets or single-cell data.
# add abundance for each dataset from the AnnData into the SwanGraphsg = swan.SwanGraph()sg.add_annotation(annot_gtf)sg.add_transcriptome(data_gtf)sg.add_adata(adata_file)
Adding annotation to the SwanGraph
Adding transcriptome to the SwanGraph
Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
Calculating TPM...
Calculating PI...
Calculating edge usage...
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:828: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
adata = anndata.AnnData(var=var, obs=obs, X=X)
/Users/fairliereese/miniconda3/envs/scanpy_2/lib/python3.7/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
Calculating TSS usage...
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:759: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
adata = anndata.AnnData(var=var, obs=obs, X=X)
Calculating TES usage...
By adding abundance information from either an AnnData or TSV file, Swan will also automatically calculate the counts and TPM for each TSS, TES, and intron or exon. If you had previously used add_transcriptome() to add a GTF that was generated by Cerberus or uses Cerberus-style transcript IDs (ie. <gene_id>[1,1,1]), Swan will also calculate intron chain counts and TPM automatically.
Adding gene-level abundance
You can also store gene expression in the SwanGraph. This can either be done from a TALON abundance TSV that contains transcript-level counts where the counts for each transcript will be summed up across the gene. Alternatively, supply this function a gene-level counts matrix where the first column is the gene ID rather than the transcript ID, but otherwise follows the input abundance TSV format.
# add gene-level abundance to the SwanGraphsg.add_abundance(ab_file, how='gene')
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:363: FutureWarning: X.dtype being converted to np.float32 from int64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
adata = anndata.AnnData(var=var, obs=obs, X=X)
Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
Calculating TPM...
Saving and loading your SwanGraph
Following this, you can save your SwanGraph so you can easily work with it again without re-adding all the data.
# save the SwanGraph as a Python pickle filesg.save_graph('data/swan')
Saving graph as data/swan.p
And you can reload the graph again.
# load up a saved SwanGraph from a pickle filesg = swan.read('data/swan.p')
Read in graph from data/swan.p
Adding transcript models from a TALON DB
Swan is also directly compatible with TALON databases and can pull transcript models directly from them. You can also optionally pass in a list of isoforms from talon_filter_transcripts to filter your input transcript models.
# for this new example, create a new empty SwanGraphsg = swan.SwanGraph()# and add the annotation transcriptome to itsg.add_annotation(annot_gtf)# add transcriptome from TALON dbsg.add_transcriptome(talon_db, pass_list=pass_list)# add each dataset's abundance information to the SwanGraphsg.add_abundance(ab_file)
Adding annotation to the SwanGraph
Adding transcriptome to the SwanGraph
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:346: FutureWarning: X.dtype being converted to np.float32 from int64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
adata = anndata.AnnData(var=var, obs=obs, X=X)
Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.
Calculating TPM...
Calculating PI...
Calculating edge usage...
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:810: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
/Users/fairliereese/miniconda3/envs/scanpy_2/lib/python3.7/site-packages/anndata/_core/anndata.py:121: ImplicitModificationWarning: Transforming to str index.
warnings.warn("Transforming to str index.", ImplicitModificationWarning)
Calculating TSS usage...
/Users/fairliereese/Documents/programming/mortazavi_lab/bin/swan_vis/swan_vis/swangraph.py:741: FutureWarning: X.dtype being converted to np.float32 from float64. In the next version of anndata (0.9) conversion will not be automatic. Pass dtype explicitly to avoid this warning. Pass `AnnData(X, dtype=X.dtype, ...)` to get the future behavour.
Calculating TES usage...
Adding metadata
Swan provides functionality to perform tests and plotting on the basis of metadata categories. Add metadata by calling the SwanGraph.add_metadata() function, or use the SwanGraph.add_adata() function to add both expression information and metadata at the same time.
Read in graph from data/swan.p
/Users/fairliereese/miniconda3/envs/scanpy_2/lib/python3.7/site-packages/anndata/_core/anndata.py:798: UserWarning:
AnnData expects .obs.index to contain strings, but got values like:
[0, 1, 2, 3, 4]
Inferred to be: integer
value_idx = self._prep_dim_index(value.index, attr)
sg.adata.obs
Behavior with Cerberus
When you use a Cerberus GTF in SwanGraph.add_annotation() or SwanGraph.add_transcriptome(), keep in mind the following:
Swan will use the TSS / TES assignments as dictated by Cerberus to define unique entries in SwanGraph.tss_adata and SwanGraph.tes_adata. For instance, if the same vertex is used in more than one gene, they will still be treated as separate vertices in the TSS / TES AnnDatas.
Swan will automatically pull intron chain information from the transcript triplet in Cerberus and use it to generate an AnnData tracking the expression of intron chains separately from the transcripts they come from in SwanGraph.ic_adata. This can also be used to perform isoform switching tests.
Currently, Swan does not parse Cerberus novelty categories. We are hoping to support this in a future release.