Manual

Full documentation for OCTUPUS can be found here <link>. For an example, see below. Either extract the OCTUPUS files to where your sequence and quality files are, or extract them to a linked directory.

Step 1 : Data Trimming

Coordinates for data trimming is identified using LUCY and we provide a script to trim both your sequence and quality file based on the information provided by LUCY.

perl lucyTrim.pl <fasta File> <quality File>

fasta File - 454 fasta file, cannot contain blank lines
quality File - 454 quality file, cannot contain blank lines

This generates two files lucy.seq (trimmed fasta file) and lucy.qul (trimmed quality file).

Step 2 : Sequence Tagging

OCTUPUS provides a script to find the MID tags from your trimmed fasta sequence. First create a "primer list" which contains all of MID tags seperated by a newline (see provided primer file).

perl searchAdapter.pl <primer list> <trimmed fasta sequence> <output file> <stats file> <primer start num>

primer list - primer list containing newline delimited primers
trimmed fasta sequence - fasta sequences from lucyTrim.pl    
output file - fasta sequences with the tagging information in the header 
	The output file can now be used for octu.p;
stats file - displays each sequence and what number tag it is associated with 
primer start num - primer starting number (default is 1)

Step 3 : Cluster OCTUs

perl octu.pl <tagged fasta file> <minimumID> <minimumSeqLen>

tagged fasta file - fasta sequences generated from searchAdapter.pl
minimumID - ID similarity to cluster (eg : 95 = 95% similar sequences in cluster) 
minimumSeqLen - minimum fasta sequence length to use for clustering (eg 200 = >199bp)

There are three output files from octu.pl

octulist - consensus sequence for each OCTU.
octuall.seq - fasta sequences contained in each OCTU.
div - sequence counts and nucleotide diversity for each OCTU.

At this point you have sequence clusters, if you desire automated taxonomy assignment, see below.

Step 4 : Taxonomy assignment

In order to compare your generated OCTUs against NCBI, it is necessary to download the nucleotide databases from NCBI. This will take some time. If you are using your own databases, you must format them using this command. (Note : if your formatdb and megablast are not linked, instead type in the directory where formatdb and megablast are located. eg: /usr/local/blast-2.2.22/bin/formatdb).

formatdb -o T -p F -i <fasta file> <database name>

From here, blast the octulist against the desired database.

megablast -d <database loc> -D 2 -p <cutoff %> -a <processor #> -b 1 -v 1 -i octulist -F F > <octulist.blast>

database loc - where database location is
cutoff % - identity cutoff to match against NCBI entry (90% is good) 
processor # - number of processing threads your computer has
octulist.blast - blast file output name

The resulting octulist.blast is now filtered into tabulated format using blastFilter.pl.

perl blastFilter.pl <octulist.blast> <octulist.blastFilter>

octulist.blast - blast file generated from megablast
octulist.blastFilter - filtered blast file output name

The blastFilter file can be used to fetch taxonomy from NCBI using GI/EMBL numbers.

perl getTaxonomy.pl <octulist.blastFilter> <octulist.taxonomy>

octulist.blastFilter - filtered blast file generated from blastFilter.pl
octulist.taxonomy - taxonomy file output name

In order to combine all the information into a tabulated excel file you can open in windows :

perl blastFinal.pl <octulist.blastFilter> <div> <octulist.taxonomy> <octulist.blastFinal>

octulist.blastFilter - filtered blast file generated from blastFilter.pl
div - nucleotide diversity file generated from octu.pl
octulist.taxonomy - taxonomy file generated from getTaxonomy.pl
octulist.blastFinal - excel file containing combined information

In order to extract the primer MID tags :

perl primerExcel.pl <octuall.seq> <octulist.blastFinal> <octulist.primerExcel> <numPrimers>

octuall.seq - sequences in each octu genereated from octu.pl 
octulist.blastFinal - combined excel file generated from blastFinal.pl
octulist.primerExcel - output file name
numPrimers - the total number of primers (entries in primer file)

Step 5 : Check for chimeric sequences

To run a chimera check on your OCTU sequences :

perl chimera.pl <octulist> <div> <chimera> <identity>

octulist - consensus sequences genereated from octu.pl 
div - nucleotide diversity file generated from octu.pl
chimera - chimera output file name
identity - the percent sequence identity necessary to find chimeras. This number 
	should be higher than the identity in octu.pl (eg : 95 = 5% diff).

OCTUPUS : Operationally Clustered Taxonomic Units for Parallel-tagged Ultra Sequencing.

Navigation

Main Page

Why OCTUPUS?

Quick Installation

Manual

SF Project Page

SF Download Page

Links

NCBI Blast

Blast Databases

Lucy

Muscle

HCGS - UNH

Manual

Step 1 : Data Trimming

Step 2 : Sequence Tagging

Step 3 : Cluster OCTUs

Step 4 : Taxonomy assignment

Step 5 : Check for chimeric sequences