Aplysia RNA-seq Assembly utilities

The Broad Institute sequenced at least 50 million Illumina reads from RNA from each of 10 Aplysia tissues, provided by Leonid Moroz of Whitney Labs, as part of a collaborative project with Columbia University. These reads were assembled into full- and partial-length transcripts using Trinity, a transcriptome assembler recently developed by the Broad Institute and Hebrew University.

This website for this Aplysia transcriptome was developed by the Institute for Genome Sciences at the University of Maryland, Baltimore.

These transcriptome assemblies have not yet gone through NCBI contamination screens and could contain vector or other contamination.

The sequencing, data analysis and website are supported by NIH and NSF.

Web-based BLAST form (2014 assemblies)
Transcript lookup utility (2014 assemblies)
Transcript relative abundance (2014 assemblies)
PacBio pilot BLAST form
2014 orientation-corrected assemblies

Early Broad and IGS pilot assemblies

For any questions or feedback please email Aplysia.RNAseq@gmail.com

Protocol for BLASTs and interpreting results

This is a wonderful new resource assembled from massive, high throughput Illumina sequencing of the transcriptome (the RNAs).

This blast website installed at IGS is slightly different from similar sites at NCBI. In order to retrieve a sequence of interest, you should return to the main page (or open it in a second window), go to the Transcript lookup utility and paste in the transcript ID and the sequence length. For example, for ">comp16460_c0_seq1 len=7225" you would use:

comp16460_c0_seq1 and 7225

(Sequence length is needed because assemblies were done and IDs generated separately for each tissue, so transcript IDs may not be unique)

We recommend initially searching the combined transcripts from all 10 tissues. Many sequences are only partially represented in individual tissues because of low expression levels and incomplete representation in the reads. Thus, transcripts of interest in neurons may be most complete in the data from another tissue (e.g. Chemosensory_tissue or heart).

An explanation of multiple transcript IDs with the same comp #:

Some confusing things that we have observed in our initial blast results are summarized below:

  1. There are multiple entries for many mRNAs. There are a number of reasons for this, some interesting and others technical.

    Interesting reasons:

    • Multiple start sites and stop sites for mRNAs are treated as distinct transcripts
    • Alternative splicing will lead to distinct transcripts

    Technical reasons:

    • If there is a gap in the reads for a single transcript, the different regions of the mRNA will be on distinct assemblies.
    • The reads from different tissues have been assembled separately, so you will get multiple hits. You may get information on expression of a specific transcript by searching different tissues, but this information is not reliable if the transcript is rare. Moreover, if you don't see full length hits this may be because the transcript is rare and not fully assembled.
  2. Some assemblies are reverse sequences. We do not know why these are so abundant; but the strand-specific sequencing is not foolproof. For reverse sequences, you will have to use the reverse complement of the sequence to see the alignment.
  3. Some of the assemblies you examine may be incorrect or incomplete. This is still preliminary data. You will need to look carefully for anomalies. You may see frameshifts where the reading frame changes; this is probably a sequencing error.

Wayne Sossin
Tom Abrams