How our genetic analysis technology works

The science behind 83 million variants, explained

At Advanced Health Genetics, we believe that transparency in science builds trust. This document explains the methodology behind our genetic analysis technology, the data sources we use, and how we validate accuracy claims.

Compliance Notice: Our technology provides genetic insights for educational purposes. It does not diagnose, treat, or prevent disease.

1. Genetic Imputation Explained

What is Genetic Imputation?

Genetic imputation is a statistical method used to infer the values of unmeasured genetic variants (SNPs—single nucleotide polymorphisms) based on measured ones and reference data from large population studies.

When you take a genetic test, the laboratory typically genotypes approximately 750,000 SNPs using a microarray chip. These directly measured variants are called raw SNPs. However, the human genome contains over 600 million potential variants. Imputation fills the gaps.

How Imputation Works

The imputation process uses two key inputs:

Your genotype data: The ~750,000 SNPs directly measured from your genetic sample

Reference haplotype panels: Large datasets containing genetic information from thousands of individuals across diverse populations

The imputation algorithm compares your measured SNPs to those in the reference panel. When your measured SNPs match a specific pattern in the reference data, the algorithm infers which unmeasured SNPs you likely have based on patterns of linkage disequilibrium (LD)—the tendency for nearby variants to be inherited together.

From 750,000 to 83 Million Variants

Our imputation process expands from approximately 750,000 raw SNPs to 83 million variants. This expansion is achieved through:

Multi-population reference panels: We use the combined 1000 Genomes Project and Haplotype Reference Consortium data
Computational inference: Advanced statistical methods infer variants not directly measured
Ancestry-specific imputation: We adjust imputations for your individual ancestry, improving accuracy

The 83 million variants include common variants (MAF > 5%), low-frequency variants (MAF 0.5%-5%), and rare variants (MAF < 0.5%), providing a comprehensive picture of your genetic architecture.

Reference Data Sources

Our imputation relies on publicly available, peer-reviewed reference panels:

1000 Genomes Project: Whole genome sequences from 2,504 individuals across 26 global populations

Haplotype Reference Consortium (HRC): Over 64,000 haplotypes providing dense coverage of European ancestry

CAAPA: Consortium on Asthma among African-ancestry Populations in the Americas

UK Biobank: 500,000 individuals for validation and accuracy testing

2. Bayesian Deep Learning Methodology

What is Bayesian Machine Learning?

Bayesian machine learning is an approach that combines machine learning with probability theory. Unlike standard neural networks that learn fixed point estimates, Bayesian models learn probability distributions over parameters. This allows the model to express uncertainty in its predictions.

In Bayesian frameworks, we compute:

Prior distribution: Our initial belief about model parameters before observing data

Likelihood: How well the model explains the observed genetic data

Posterior distribution: Our updated belief after combining the prior with observed data

This probabilistic approach is essential in genetics, where uncertainty is inherent. A genetic risk score is not a definitive prediction; it is a probability-based estimate that appropriately conveys confidence levels.

Why Bayesian Methods for Genomics?

We chose Bayesian deep learning for several scientific reasons:

Uncertainty quantification: Genomics is inherently uncertain. Bayesian models provide confidence intervals around predictions, not just point estimates
Handling missing data: Genetic datasets often have missing values; Bayesian approaches naturally incorporate these uncertainties
Population stratification: Bayesian hierarchical models can account for genetic variation across ancestry groups
Generalization: These models generalize better to new populations and reduce overfitting

Discriminative AI vs. Generative AI

CRITICAL DISTINCTION: Our system uses discriminative AI, not generative AI. This is a fundamental difference that affects how our models work and what they can do.

Feature	Discriminative AI (Our Approach)
Purpose	Identify patterns in actual data
Output	Classification or probability scores
Hallucination Risk	Cannot generate false information
Validation	Directly testable against known outcomes

3. Ancestry-Adjusted Polygenic Risk Scores

Polygenic Risk Scores (PRS) aggregate the effects of many genetic variants to estimate disease risk. Traditional PRS have a significant limitation: they were primarily developed on European populations and perform poorly for other ancestries.

The Ancestry Bias Problem

Studies have shown that PRS developed on European populations have substantially reduced predictive accuracy when applied to African, Asian, or admixed populations. This creates health equity concerns, as genetic insights become less useful for non-European individuals.

Our Solution: Ancestry-Adjusted Algorithms

Our platform implements ancestry-adjusted PRS that:

Detect your genetic ancestry composition using reference panels
Apply ancestry-specific effect sizes from multi-ethnic GWAS studies
Weight variants differently based on their frequency in your ancestral populations
Provide confidence intervals that reflect ancestry-related uncertainty

4. Accuracy Claims with Evidence

When we claim "99.7% accuracy for common variants," this is a specific, measurable claim with defined parameters:

What we measured: Concordance between imputed genotypes and directly measured genotypes from whole genome sequencing

Sample size: Validated against 10,000+ samples with both imputed and WGS data

Variant type: Common variants (MAF > 5%). "Common" is defined by minor allele frequency in reference populations

Metric used: R² (squared correlation) between imputed and true genotypes, a standard measure in genomics

Accuracy varies by variant type:

Variant Type	Typical Accuracy
Common (MAF > 5%)	99.7% (R² > 0.997)
Low-frequency (0.5-5%)	~95% (R² > 0.95)
Rare (< 0.5%)	Variable, confidence flagged

5. Data Sources & Academic References

Our methodology is built on peer-reviewed research and publicly available datasets:

1000 Genomes Project Consortium (2015). "A global reference for human genetic variation." Nature 526, 68-74.
McCarthy et al. (2016). "A reference panel of 64,976 haplotypes for genotype imputation." Nature Genetics 48, 1279-1283.
Martin et al. (2019). "Clinical use of current polygenic risk scores may exacerbate health disparities." Nature Genetics 51, 584-591.
Khera et al. (2018). "Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations." Nature Genetics 50, 1219-1224.

6. Technology Partnership & Medical Oversight

Powered by Omics Edge

Advanced Health Genetics' genetic analysis platform is built on Omics Edge, a B2B bioinformatics platform specializing in population genetics and personalized medicine. This partnership ensures our technology benefits from:

Cutting-edge computational infrastructure for large-scale genetic analysis
Rigorous quality control and data validation pipelines
Access to diverse populations and ancestries for training and validation

Research & Development Team

Our technology is developed and maintained by a team of 70+ scientists, medical doctors, PhDs, and engineers with expertise in:

Population genetics and statistical genomics
Machine learning and Bayesian inference
Cardiology, oncology, endocrinology, and other medical specialties
Bioinformatics and high-performance computing

Investment in Technology

We have invested more than $20 million in the development, validation, and continuous improvement of our genetic analysis platform. This investment reflects our commitment to scientific rigor, accuracy, and continuous improvement.

Medical Oversight

While our technology is powered by advanced algorithms and the Omics Edge scientific team, we maintain medical oversight to ensure clinical appropriateness of our reports and recommendations.

Medical Advisor: Dr. Arif Ali, MBBS, RMP

Dr. Ali provides medical oversight for AHG, ensuring that:

Report methodologies align with medical best practices
Health recommendations are clinically appropriate
Reports include proper disclaimers and guidance for healthcare provider consultation
Supplement and lifestyle recommendations are reviewed for safety

Medical License: 781874-05-M

7. What This Means For You

Scientific transparency: You now understand that our '99.7% accuracy' claim is grounded in specific computational methods applied to validated benchmarks. This is not marketing language—it is a precise technical claim with documented methodology.

More complete genetic picture: By expanding your 750,000 raw SNPs to 83 million variants, we provide a comprehensive view of genetic architecture relevant to health risks.

Personalized to your ancestry: Unlike generic PRS developed on European populations, our ancestry-adjusted scores account for your specific genetic background.

Medical oversight: Your reports are developed under the medical guidance of Dr. Arif Ali, MBBS, ensuring clinical appropriateness.

Confidence in the science: Every claim in your Advanced Health Genetics report is traceable to peer-reviewed research. We cite our sources and continuously validate our predictions.

Educational context: Genetic insights are tools for understanding your health risks and making informed decisions with your healthcare provider. They are not diagnoses.

Important Disclaimer

Advanced Health Genetics' technology provides genetic insights for educational and informational purposes only. This technology is not intended to diagnose, treat, cure, or prevent any disease. Genetic risk scores represent statistical associations, not certainties. Always consult with qualified healthcare providers before making health decisions based on genetic information.