This article uses typical raw sequence data produced from a Sanger sequencing run to learn how to edit and align chromatograms for downstream Biologics annotator analysis. The article covers assembly of Sanger sequences with traces, consensus sequences building, and heterozygous bases calling.
Note: "Set and Merge Paired Reads" preprocessing operation also allows users to merge sequences together. The "Set and Merge Paired Reads" operation uses BBMerge and can be less accurate than batch assembling, however it can be run on very large datasets and is thus more suitable for high throughput or NGS use.
Select all the sequences the raw sequences and select Batch Assemble Sanger Sequences in the dropdown.
Assemble by Sequence Name
You can assemble the sequences based on sequence names. Enter an appropriate separator and then choose the part that corresponds to a unique identifier to assemble sequences with that matching identifier together. In the example below this corresponds to selecting 4th in the Name part dropdown and input "_" (hyphen) as the Name separator to assemble the selected sequences by matching well IDs. Check the example, the sequences should match the 4th position of the sequence name when the sequence name is separated by hyphen.
Assemble by Name Scheme
Sequences can be assembled by matching common parts of the sequence names to identify which sequences to assemble together. This involves using a name scheme defined by an administrator of your organization. The name scheme will be applied to the sequence names to extract fields defined in the name scheme, such as the Common Identifier name, chain and sequencing direction. Assembly will be carried out on a combination of the Common Identifier and Chain fields (if present). For more information about name schemes, see What Is a Name Scheme and Why Is It Useful?
Assemble by List
Assembling sequences by list is useful if your sequences are interlaced and have been grouped into a list. The assembly will be carried out on each pair of sequences, starting from the first, e.g. sequences 1 and 2 will be assembled together, sequences 3 and 4 will be assembled together, and so forth.
Click Run to start the operation. This operation with the above settings should produce 6 Contigs, 6 consensus sequences and an assembly report.
To call heterozygous bases, in the Batch Assembly options you can select Consensus: call Sanger heterozygotes > and input 50%. To learn more about heterozygous bases calling and annotation, please refer to this article.
Select the an output Assembly contig to check whether heterozygote bases are present in the assembled sequences.
To quickly identify potential heterozygous bases, click Zoom out to full View in the Sequence Viewer and in the sidebar, ensure that Annotations>trimmed, Highlightings, Consensus and Graphs>Pairwise identity are selected. Look for regions within the Identity graph with low peaks, select the region around position 900-950 bp and zoom in. A heterozygote base M, is called in position 923 bp as the second peak is at least 50% of the height of the first peak.
Viewing consensus sequences
To view the consensus sequences of all of assembled sequences, select the Assembly Consensus Sequences document output. This is the sequence list that you will be using as input in downstream operations such as Antibody Annotator.
You can also view the Assembly Report for a summary of what did and didn't assemble successfully.