This utility takes multiple sequences and provides a UPAC-derived coding for polymorphic sites. In the UPAC system shown here various combinations of possible sequences are given unique codes. For instance, a locus with an "A" in some specimens and a "T" in others will be coded a "W". This system is consistent with search algorithms such as Blast and the search in BOLD.
An "N" indicates that either all four possible nucleotides have been found, or more commonly, that an equivocal response was received during sequencing. For this reason, I have treated "N"s in the source sequences the same as "-", as having no information. Other such programs might treat "N" as meaning the consensus sequence should also be an "N".
If some sequences are longer than the standard barcoding region, the consensus will be longer than the expected 658 bp. I suggest users trim their sequences using a verified 658 bp sequence as a model. Similarly, if all sequences are shorter than 658 bp, the user could manually enter the required hyphens to indicate missing data.
Most importantly, sequences must be aligned before submitting them! Otherwise, incorrect (and bizarre) consensus barcodes will result.