How our genetic analysis technology works
The science behind 83 million variants, explained
At Advanced Health Genetics, we believe that transparency in science builds trust. This document explains the methodology behind our genetic analysis technology, the data sources we use, and how we validate accuracy claims.
Compliance Notice: Our technology provides genetic insights for educational purposes. It does not diagnose, treat, or prevent disease.
1. Genetic Imputation Explained
What is Genetic Imputation?
Genetic imputation is a statistical method used to infer the values of unmeasured genetic variants (SNPs—single nucleotide polymorphisms) based on measured ones and reference data from large population studies.
When you take a genetic test, the laboratory typically genotypes approximately 750,000 SNPs using a microarray chip. These directly measured variants are called raw SNPs. However, the human genome contains over 600 million potential variants. Imputation fills the gaps.
How Imputation Works
The imputation process uses two key inputs:
Your genotype data: The ~750,000 SNPs directly measured from your genetic sample
Reference haplotype panels: Large datasets containing genetic information from thousands of individuals across diverse populations
The imputation algorithm compares your measured SNPs to those in the reference panel. When your measured SNPs match a specific pattern in the reference data, the algorithm infers which unmeasured SNPs you likely have based on patterns of linkage disequilibrium (LD)—the tendency for nearby variants to be inherited together.
From 750,000 to 83 Million Variants
Our imputation process expands from approximately 750,000 raw SNPs to 83 million variants. This expansion is achieved through:
-
Multi-population reference panels: We use the combined 1000 Genomes Project and Haplotype Reference Consortium data
-
Computational inference: Advanced statistical methods infer variants not directly measured
-
Ancestry-specific imputation: We adjust imputations for your individual ancestry, improving accuracy
The 83 million variants include common variants (MAF > 5%), low-frequency variants (MAF 0.5%-5%), and rare variants (MAF < 0.5%), providing a comprehensive picture of your genetic architecture.
Reference Data Sources
Our imputation relies on publicly available, peer-reviewed reference panels:
1000 Genomes Project: Whole genome sequences from 2,504 individuals across 26 global populations
Haplotype Reference Consortium (HRC): Over 64,000 haplotypes providing dense coverage of European ancestry
CAAPA: Consortium on Asthma among African-ancestry Populations in the Americas
UK Biobank: 500,000 individuals for validation and accuracy testing
2. Bayesian Deep Learning Methodology
What is Bayesian Machine Learning?
Bayesian machine learning is an approach that combines machine learning with probability theory. Unlike standard neural networks that learn fixed point estimates, Bayesian models learn probability distributions over parameters. This allows the model to express uncertainty in its predictions.
In Bayesian frameworks, we compute:
Prior distribution: Our initial belief about model parameters before observing data
Likelihood: How well the model explains the observed genetic data
Posterior distribution: Our updated belief after combining the prior with observed data
This probabilistic approach is essential in genetics, where uncertainty is inherent. A genetic risk score is not a definitive prediction; it is a probability-based estimate that appropriately conveys confidence levels.
Why Bayesian Methods for Genomics?
We chose Bayesian deep learning for several scientific reasons:
-
Uncertainty quantification: Genomics is inherently uncertain. Bayesian models provide confidence intervals around predictions, not just point estimates
-
Handling missing data: Genetic datasets often have missing values; Bayesian approaches naturally incorporate these uncertainties
-
Population stratification: Bayesian hierarchical models can account for genetic variation across ancestry groups
-
Generalization: These models generalize better to new populations and reduce overfitting
Discriminative AI vs. Generative AI
CRITICAL DISTINCTION: Our system uses discriminative AI, not generative AI. This is a fundamental difference that affects how our models work and what they can do.
|
Feature |
Discriminative AI (Our Approach) |
|
Purpose |
Identify patterns in actual data |
|
Output |
Classification or probability scores |
|
Hallucination Risk |
Cannot generate false information |
|
Validation |
Directly testable against known outcomes |
3. Ancestry-Adjusted Polygenic Risk Scores
Polygenic Risk Scores (PRS) aggregate the effects of many genetic variants to estimate disease risk. Traditional PRS have a significant limitation: they were primarily developed on European populations and perform poorly for other ancestries.
The Ancestry Bias Problem
Studies have shown that PRS developed on European populations have substantially reduced predictive accuracy when applied to African, Asian, or admixed populations. This creates health equity concerns, as genetic insights become less useful for non-European individuals.
Our Solution: Ancestry-Adjusted Algorithms
Our platform implements ancestry-adjusted PRS that:
-
Detect your genetic ancestry composition using reference panels
-
Apply ancestry-specific effect sizes from multi-ethnic GWAS studies
-
Weight variants differently based on their frequency in your ancestral populations
-
Provide confidence intervals that reflect ancestry-related uncertainty
4. Accuracy Claims with Evidence
When we claim "99.7% accuracy for common variants," this is a specific, measurable claim with defined parameters:
What we measured: Concordance between imputed genotypes and directly measured genotypes from whole genome sequencing
Sample size: Validated against 10,000+ samples with both imputed and WGS data
Variant type: Common variants (MAF > 5%). "Common" is defined by minor allele frequency in reference populations
Metric used: R² (squared correlation) between imputed and true genotypes, a standard measure in genomics
Accuracy varies by variant type:
|
Variant Type |
Typical Accuracy |
|
Common (MAF > 5%) |
99.7% (R² > 0.997) |
|
Low-frequency (0.5-5%) |
~95% (R² > 0.95) |
|
Rare (< 0.5%) |
Variable, confidence flagged |
5. Data Sources & Academic References
Our methodology is built on peer-reviewed research and publicly available datasets:
-
1000 Genomes Project Consortium (2015). "A global reference for human genetic variation." Nature 526, 68-74.
-
McCarthy et al. (2016). "A reference panel of 64,976 haplotypes for genotype imputation." Nature Genetics 48, 1279-1283.
-
Martin et al. (2019). "Clinical use of current polygenic risk scores may exacerbate health disparities." Nature Genetics 51, 584-591.
-
Khera et al. (2018). "Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations." Nature Genetics 50, 1219-1224.
6. Technology Partnership & Medical Oversight
Powered by Omics Edge
Advanced Health Genetics' genetic analysis platform is built on Omics Edge, a B2B bioinformatics platform specializing in population genetics and personalized medicine. This partnership ensures our technology benefits from:
-
Cutting-edge computational infrastructure for large-scale genetic analysis
-
Rigorous quality control and data validation pipelines
-
Access to diverse populations and ancestries for training and validation
Research & Development Team
Our technology is developed and maintained by a team of 70+ scientists, medical doctors, PhDs, and engineers with expertise in:
-
Population genetics and statistical genomics
-
Machine learning and Bayesian inference
-
Cardiology, oncology, endocrinology, and other medical specialties
-
Bioinformatics and high-performance computing
Investment in Technology
We have invested more than $20 million in the development, validation, and continuous improvement of our genetic analysis platform. This investment reflects our commitment to scientific rigor, accuracy, and continuous improvement.
Medical Oversight
While our technology is powered by advanced algorithms and the Omics Edge scientific team, we maintain medical oversight to ensure clinical appropriateness of our reports and recommendations.
Medical Advisor: Dr. Arif Ali, MBBS, RMP
Dr. Ali provides medical oversight for AHG, ensuring that:
-
Report methodologies align with medical best practices
-
Health recommendations are clinically appropriate
-
Reports include proper disclaimers and guidance for healthcare provider consultation
-
Supplement and lifestyle recommendations are reviewed for safety
Medical License: 781874-05-M
7. What This Means For You
Scientific transparency: You now understand that our '99.7% accuracy' claim is grounded in specific computational methods applied to validated benchmarks. This is not marketing language—it is a precise technical claim with documented methodology.
More complete genetic picture: By expanding your 750,000 raw SNPs to 83 million variants, we provide a comprehensive view of genetic architecture relevant to health risks.
Personalized to your ancestry: Unlike generic PRS developed on European populations, our ancestry-adjusted scores account for your specific genetic background.
Medical oversight: Your reports are developed under the medical guidance of Dr. Arif Ali, MBBS, ensuring clinical appropriateness.
Confidence in the science: Every claim in your Advanced Health Genetics report is traceable to peer-reviewed research. We cite our sources and continuously validate our predictions.
Educational context: Genetic insights are tools for understanding your health risks and making informed decisions with your healthcare provider. They are not diagnoses.
Important Disclaimer
Advanced Health Genetics' technology provides genetic insights for educational and informational purposes only. This technology is not intended to diagnose, treat, cure, or prevent any disease. Genetic risk scores represent statistical associations, not certainties. Always consult with qualified healthcare providers before making health decisions based on genetic information.