Repertoire comparison by next-generation sequencing gives insight into the breadth of the antibody repertoire used in an immune response. Antibody repertoire comparison comparisons can be performed using the Compare operation. To compare NGS results, you will need to:
- Run the Antibody annotator pipeline
- Select more than 2 NGS Result files for comparison
To run an NGS analysis select your file and select Antibody annotator in the Annotation dropdown. To learn more about NGS analysis configuration, please refer to the following article.
To run comparisons, select a minimum 2 Antibody annotator output files (File type: Biologics Annotator Result) and click Compare in the Post-processing dropdown.
In order to compare samples, normalization or scaling is often performed to remove variation between samples that prevents direct comparison of data. To perform normalization, select a normalization method (see below for more details on each method) from the dropdown.
Depending on your input sequences, some sequences may not be fully annotated and some may contain stop codons. These sequences may influence the comparisons analysis as they contribute towards the normalization calculations. To compare sequences that are fully annotated, in frame and, without stop codons, select the Only use sequences that are fully annotated, in frame and, without stop codons option.
To start the NGS comparison operation, click Run. This operation will produce a NGS Comparison Result document.
This is a simplistic way to compare samples by scaling the raw frequency counts based on the total number of sequences in the sample relative to the other sample. Total count normalization effectively just compares the raw frequency percentages of each cluster. However, this is prone to problems. For example, if one sample has a single new cluster that makes up 50% of sequences, then all other clusters will appear to have half their usual frequency, despite them not actually being any less frequent.
Median of frequency ratios
To solve the above bias, this method uses the DESeq2 approach to sample normalization during differential gene expression, but with an additional heuristic to exclude clusters with very low frequencies when calculating the normalization ratio. This is because the median frequencies may often be only 1 or 2 sequences, which can lead to inaccurate normalization ratios. For example if the median cluster has 1 sequence in one sample and 2 sequences in the other, this would produce a normalization ratio of 2. Instead we take the median ratio of those clusters where both samples have at least as many sequences as specified by the 'of frequencies at least' setting.
Total count excluding upper quartile (recommended)
This is another approach used during differential gene expression normalization, and is usually better suited (than the DESeq2 normalization method) to immune system data comparison. This is due to there often being only a few regions in common between samples that have significant numbers of sequences, and selecting the median ratio of these produces a value which is quite sensitive to a change in only a few sequence frequencies.
The choice of the normalization method used affects what the normalized ratio of each cluster will be as well as the P-Value.
P-values are used to indicate whether or not the difference in size between the two clusters is statistically significant. When comparing samples, we may not be interested in cases where the frequencies only differ by a ratio of 2 for example. However, the P-Values calculated in this case could still indicate that this is a statistically significant difference. This setting assumes that any ratio less than this is not significant and reduces the significance of ratios larger than this accordingly when calculating P-Values. For example, if this setting is 2, and a region has a ratio of 2.5, the calculated P-Value would be similar to that of a case where this setting is 1 and the region ratio is 1.5. This minimum ratio is applied equally in both directions, so if this setting is 2, then a ratio of 0.5 would not be considered significant either.
Note that the Median of frequency ration normalization method is not supported for comparisons of more than 2 NGS result documents.