Tissue 1 (anterior gills) - FR Tissue 1 (anterior gills) - RF Tissue 2 (posterior gills) - FR Tissue 2 (posterior gills) - RF
Tissue 3 (female + male gonads) - FR Tissue 3 (female + male gonads) - RF Tissue 4 (eye stalk + muscle) - FR Tissue 4 (eye stalk + muscle) - RF
Tissue 5 (1st + 2nd Zoea stage) - FR Tissue 5 (1st + 2nd Zoea stage) - RF Tissue 6 (3rd Zoea stage) - FR Tissue 6 (3rd Zoea stage) - RF


Trinity is a platform for de nova transcriptome assembly from RNA-seq data in the absence of a reference genome. It partitions RNA-seq data into several independent de Bruijn graphs (ideally one graph per expressed gene) and uses parallel computing to reconstruct full-length transcripts for alternatively spliced isoforms from these graphs. The Trinity assembly pipeline combines three consecutive independent software modules: Inchworm, Chrysalis and Butterfly. Due to Trinity's ability to leverage strand-specific Illumina paired-end libraries it proved an excellent platform to assemble our data.

We have generated two fasta files for each tissue: one in the reverse/forward (RF) direction and the other in the forward/reverse (FR) direction. All 12 files are available for download. Fasta entry for one of the transcripts in the Trinity output file is formatted like so :

>c115_g5_i1 len=247 path=[31015:0-148 23018:149-246]

The accession encodes the Trinity gene and isoform information. In the example above, the accession c115_g5_i1 indicates Trinity read cluster c115, gene g5, and isoform i1. Because a given run of trinity involves many clusters of reads, each of which are assembled separately, and because the gene numberings are unique within a given processed read cluster, the gene identifier should be considered an aggregate of the read cluster and corresponding gene identifier, which in this case would be c115_g5. So, in summary, the above example corresponds to gene id: c115_g5 encoding isoform id: c115_g5_i1.

