This article uses typical raw sequence data produced from a Sanger sequencing run to learn how to edit and align chromatograms for downstream Biologics annotator analysis. The article covers assembly of Sanger sequences with traces, consensus sequences building, and heterozygous bases calling.
Note: "Set and Merge Paired Reads" preprocessing operation also allows users to merge sequences together. The "Set and Merge Paired Reads" operation uses BBMerge and can be less accurate than batch assembling, however it can be run on very large datasets and is thus more suitable for high throughput or NGS use.
Select all the sequences the raw sequences and select Batch Assemble Sanger Sequences in the dropdown.
You can assemble the sequences based on sequence names. Enter an appropriate separator and then choose the part that corresponds to a unique identifier to assemble sequences with that matching identifier together. In the example below this corresponds to selecting 4th in the Name part dropdown and input "_" (hyphen) as the Name separator to assemble the selected sequences by matching well IDs. Check the example, the sequences should match the 4th position of the sequence name when the sequence name is separated by hyphen.
Click Run to start the operation. This operation with the above settings should produce 6 Contigs, 6 consensus sequences and an assembly report.
To call heterozygous bases, in the Batch Assembly options you can select Consensus: call Sanger heterozygotes > and input 50%. To learn more about heterozygous bases calling and annotation, please refer to this article.
Select the an output Assembly contig to check whether heterozygote bases are present in the assembled sequences.
To quickly identify potential heterozygous bases, click Zoom out to full View in the Sequence Viewer and in the sidebar, ensure that Annotations>trimmed, Highlightings, Consensus and Graphs>Pairwise identity are selected. Look for regions within the Identity graph with low peaks, select the region around position 900-950 bp and zoom in. A heterozygote base M, is called in position 923 bp as the second peak is at least 50% of the height of the first peak.
Viewing consensus sequences
To view the consensus sequences of all of assembled sequences, select the Assembly Consensus Sequences document output. This is the sequence list that you will be using as input in downstream operations such as Antibody Annotator.
You can also view the Assembly Report for a summary of what did and didn't assemble successfully.