This article is an introduction to Name Schemes. For information on how to create a Name Scheme see the How to Create a Name Scheme article. For information on how to use existing Name Schemes when performing analyses, see the How to Use a Name Scheme article.
A Name Scheme allows admins to define common sequence name structures. Once created, both admins and regular users can use Name Schemes to automatically extract important information such as the sample name, chain and sequencing direction from each sequence. This information can then be used automatically when performing analyses and for annotating results. Name schemes are usually only useful for Sanger sequences and similar, as NGS sequences (such as from MiSeq) rarely have biologically meaningful names.
Common uses for Name Schemes include:
- Creating custom columns in analysis results from sequence names, so that those columns can be used for matching when adding Assay Data.
- Defining a unique identifier from sequence names to use for pairing separate Heavy and Light (Lambda or Kappa) chain sequences.
- Defining a unique identifier and optional chain for use when assembling Sanger reads.
Let’s look at an example to illustrate. Suppose an organization's sequences are named in the following way:
VH_31023_5;B5
VH_31023_3;B6
VK_31023_5;H11
VK_31023_3;H12
There are 4 pieces of information in each sequence name: the sequenced chain, the sample identifier, the sequencing direction and the well ID. These are separated by either an underscore (_) or semicolon.
Based on this, we are able to create standard rules for how to read all sequence names from the organization. These rules would look something like this:
1) Split sequence names by either underscore or semicolon.
2) Classify the first split text piece as the chain, the second as the sample name, the third as the sequencing direction and the fourth as the well ID.
In this way, we have broken down the sequence name to only the information we are interested in, and discarded information we don't need (the delimiters and any other extraneous information).
A Name Scheme is simply comprised of these standard rules defining how to interpret sequence names and capture the parts of interest.
Note: To utilise the power of Name Schemes in Geneious Biologics, it is important that sequence names can be described by relatively consistent format or formats. It is easiest to simply establish consistent sequence name formats in your organisation. However, if this is not possible, one solution could be to Batch Rename sequences first, before applying a Name Scheme.
Comments
0 comments
Please sign in to leave a comment.