Repertoire comparison by next-generation sequencing gives insight into the breadth of the antibody repertoire used in an immune response. Antibody repertoire comparison Antibody repertoire comparisons can be performed with the NGS comparison operation. To compare NGS results, you will need to:
- Run the NGS analysis v2 pipeline
- Select 2 NGS analysis v2 pipeline output files for comparison
To run an NGS analysis select your file within the experiment folder and select NGS analysis v2 in Pipelines dropdown. To learn more about NGS analysis configuration, please refer to the following article.
To run NGS comparisons, select 2 NGS analysis pipeline output files (e.g File type: NGS Result) and click NGS Comparison in the Post-processing dropdown.
In order to compare samples to find out the relative abundance of clusters, normalization or scaling is often performed to remove variation between samples that prevents direct comparison of data. For example, if one sample has twice as many sequences in one particular cluster than the other sample, this does not necessarily mean that cluster is more abundant in the first sample, it could be due to a higher sequencing depth. The choice of the normalization method used affects what the normalized ratio of each cluster will be as well as the P-Value. The following outlines the normalization methods available for comparing antibody repertoires:
This method of normalization compares the raw frequency of each cluster based on the total number of identified regions in each sample (i.e. the frequency percentage without normalization). This method of total count normalization is less robust and not as effective at aligning distribution of cluster frequency across samples. For example, Sample A has a greater proportion of counts associated with a particular cluster (10/1,000,000) than Sample B (10/1,500,000) even though the total count values are the same. Therefore, direct comparison of the counts for the particular cluster (or any other clusters) between Sample A and Sample B is not advisable because the total number of normalized counts are different between samples.
Median of frequency ratios
This method of normalization uses the more robust DESeq2 approach where the median ratio of all the cluster count rations is used to calculate the normalization factor. DESeq2 uses a median of ratios method as the normalization factor under the assumption that most of the genes are not differentially expressed. This method is robust to the imbalance in the frequency of the number of sequences within each cluster and clusters with very low frequencies are excluded as they can lead to inaccurate normalization ratios. For example, if the median cluster has 1 sequence in one sample and 2 sequences in the other, this would produce a normalization ratio of 2. Instead, using the DESeq2 approach, we take the median ratio of those clusters where both samples have at least this many sequences.
Total count excluding upper quartile
This is another approach used during differential gene expression normalization and is usually better suited (than the DESeq2 normalization method) to immune system data comparison. This is due to there often being only a few regions in common between samples that have significant numbers of sequences, and selecting the median ratio of these produces a value which is quite sensitive to a change in only a few sequence frequencies
P-values are used to indicate whether or not the difference in size between the two clusters is statistically significant. When comparing samples, we may not be interested in cases where the frequencies only differ by a ratio of 2 for example. However, the P-Values calculated in this case could still indicate that this is a statistically significant difference. This setting assumes that any ratio less than this is not significant and reduces the significance of ratios larger than this accordingly when calculating P-Values. For example, if this setting is 2, and a region has a ratio of 2.5, the calculated P-Value would be similar to that of a case where this setting is 1 and the region ratio is 1.5. This minimum ratio is applied equally in both directions, so if this setting is 2, then a ratio of 0.5 would not be considered significant either.
Executing NGS comparisons
Depending on your input sequences, some sequences may not be fully annotated and some may contain stop codons. These sequences may influence the comparisons analysis as they contribute towards the normalization calculations and it's advisable to exclude these sequences from the analysis.
To compare sequences that are fully annotated, in frame and, without stop codons, select the Only use sequences that are fully annotated, in frame and, without stop codons option.
To start the NGS comparison operation, click Run. This operation will produce a NGS Comparison Result document.
Note that we currently only support comparisons of 2 NGS results.