Main Page


Quick Installation


SF Project Page

SF Download Page



NCBI Blast

Blast Databases





Full documentation for OCTUPUS can be found here <link>. For an example, see below. Either extract the OCTUPUS files to where your sequence and quality files are, or extract them to a linked directory.

Step 1 : Data Trimming

Coordinates for data trimming is identified using LUCY and we provide a script to trim both your sequence and quality file based on the information provided by LUCY.

perl <fasta File> <quality File>

fasta File - 454 fasta file, cannot contain blank lines
quality File - 454 quality file, cannot contain blank lines    

This generates two files lucy.seq (trimmed fasta file) and lucy.qul (trimmed quality file).

Step 2 : Sequence Tagging

OCTUPUS provides a script to find the MID tags from your trimmed fasta sequence. First create a "primer list" which contains all of MID tags seperated by a newline (see provided primer file).

perl <primer list> <trimmed fasta sequence> <output file> <stats file> <primer start num>

primer list - primer list containing newline delimited primers
trimmed fasta sequence - fasta sequences from    
output file - fasta sequences with the tagging information in the header 
	The output file can now be used for octu.p;
stats file - displays each sequence and what number tag it is associated with 
primer start num - primer starting number (default is 1) 

Step 3 : Cluster OCTUs

perl <tagged fasta file> <minimumID> <minimumSeqLen>

tagged fasta file - fasta sequences generated from
minimumID - ID similarity to cluster (eg : 95 = 95% similar sequences in cluster) 
minimumSeqLen - minimum fasta sequence length to use for clustering (eg 200 = >199bp)

There are three output files from

octulist - consensus sequence for each OCTU.
octuall.seq - fasta sequences contained in each OCTU.
div - sequence counts and nucleotide diversity for each OCTU.

At this point you have sequence clusters, if you desire automated taxonomy assignment, see below.

Step 4 : Taxonomy assignment

In order to compare your generated OCTUs against NCBI, it is necessary to download the nucleotide databases from NCBI. This will take some time. If you are using your own databases, you must format them using this command. (Note : if your formatdb and megablast are not linked, instead type in the directory where formatdb and megablast are located. eg: /usr/local/blast-2.2.22/bin/formatdb).

formatdb -o T -p F -i <fasta file> <database name>

From here, blast the octulist against the desired database.

megablast -d <database loc> -D 2 -p <cutoff %> -a <processor #> -b 1 -v 1 -i octulist -F F > <octulist.blast>

database loc - where database location is
cutoff % - identity cutoff to match against NCBI entry (90% is good) 
processor # - number of processing threads your computer has
octulist.blast - blast file output name    

The resulting octulist.blast is now filtered into tabulated format using

perl <octulist.blast> <octulist.blastFilter>

octulist.blast - blast file generated from megablast
octulist.blastFilter - filtered blast file output name

The blastFilter file can be used to fetch taxonomy from NCBI using GI/EMBL numbers.

perl <octulist.blastFilter> <octulist.taxonomy>

octulist.blastFilter - filtered blast file generated from
octulist.taxonomy - taxonomy file output name    

In order to combine all the information into a tabulated excel file you can open in windows :

perl <octulist.blastFilter> <div> <octulist.taxonomy> <octulist.blastFinal>

octulist.blastFilter - filtered blast file generated from
div - nucleotide diversity file generated from
octulist.taxonomy - taxonomy file generated from
octulist.blastFinal - excel file containing combined information

In order to extract the primer MID tags :

perl <octuall.seq> <octulist.blastFinal> <octulist.primerExcel> <numPrimers>

octuall.seq - sequences in each octu genereated from 
octulist.blastFinal - combined excel file generated from
octulist.primerExcel - output file name
numPrimers - the total number of primers (entries in primer file) 

Step 5 : Check for chimeric sequences

To run a chimera check on your OCTU sequences :

perl <octulist> <div> <chimera> <identity>

octulist - consensus sequences genereated from 
div - nucleotide diversity file generated from
chimera - chimera output file name
identity - the percent sequence identity necessary to find chimeras. This number 
	should be higher than the identity in (eg : 95 = 5% diff).