The polygenic score paradox: navigating the hope, hype, and hurdles

 

In the quest for a more personalised and predictive form of medicine, few technologies have generated as much excitement and as much controversy as polygenic risk scores. The concept is compelling: a single number, calculated from hundreds, thousands or millions of small variations in your DNA, that estimates your lifelong genetic predisposition to common diseases like heart disease, cancer, and diabetes. This vision of “predictive prevention” is driving a powerful push toward the clinic, fueled by the hope that we can finally identify at-risk individuals and intervene long before disease sets in.

Clinical promise of polygenic scores

The optimism is not unfounded. In specific, well-defined clinical scenarios, polygenic scores are beginning to demonstrate some tangible utility. For cardiovascular disease, MI-GENES, a 10-year follow-up study found that disclosing a polygenic score to patients at intermediate risk was associated with higher rates of statin initiation and, ultimately, a significantly lower incidence of heart attacks and strokes. In oncology, the large-scale WISDOM trial is using polygenic scores, alongside other variables, to successfully personalise breast cancer screening for women, moving beyond age-based mammograms to a more nuanced, risk-stratified approach. Furthermore, in complex diagnostic cases, such as differentiating between subtypes of paediatric diabetes, a polygenic score can provide evidence to guide treatment. 

These successes are not trivial; they represent real progress and are the reason why researchers, clinicians, and even commercial companies are forging ahead. And yet, for every promising headline, a wave of cautionary evidence is building, urging a more measured and critical perspective. 

Looking behind the evidence

The MI-GENES results, for example, received public health critique with some pointing out that the benefit may not be driven by the genetic data itself, but by the intensive counseling and clinical attention that accompanied it. Critics argue that a similar, or even greater, benefit could be achieved more equitably and at a fraction of the cost by simply recommending statins to that entire intermediate-risk group. 

The diagnostic utility in paediatric diabetes, while a genuine clinical advance, highlights a different problem, the magnification of a niche application. This represents a highly specific, specialist level diagnostic aid for complex cases. It is not evidence for the feasibility of a population level screening tool. 

These critiques reveal a consistent pattern where promising results in narrow, controlled, or homogenous settings often mask profound, unresolved challenges in real-world scalability, cost-effectiveness, and, most critically, health equity.

Reliability crisis

Other concerns have been raised. A comprehensive analysis of over 900 polygenic scores for 310 diseases presents a stark assessment, the scores performed poorly for general population screening. For example when compared to other standard tests, polygenic scores identified only about 10% of future cases of coronary artery disease and 12% of future cases of breast cancer.  

More unsettling is a study in JAMA revealed that different, equally “valid” polygenic scores for heart disease can produce wildly contradictory results for the same person. Finding that one in five individuals had at least one score placing them in the top 5% of genetic risk and another placing them in the bottom 5%.

This lack of individual level reliability raises well justified questions about using such a variable metric to guide life altering medical decisions.  

Scaling up challenges

These scientific hurdles are heightened by a foundational ethical issue, the lack of diversity in the genetic data used to build the scores. The vast majority of this data comes from people of European ancestry, meaning the resulting scores are far less accurate for everyone else. One major study, for instance, found a polygenic score for body mass index explained 17.6% of variation in Europeans but only 2.2% in rural Ugandans. Deploying these biased tools at scale risks creating a new form of genomic medicine that widens, rather than narrows, existing health disparities. 

Emerging evidence shows that building scores from more diverse, multi-ancestry datasets improves their accuracy. This approach makes the scores more transferable to underrepresented populations, as well as improving their predictive power for European-ancestry individuals. The reason is that more diverse data provides a more complete picture of the full spectrum of human genetic variation linked to a disease. By incorporating this richer information, the resulting scores become more robust and accurate for everyone. This scientific solution, however, only reinforces the scale of the initial problem, highlighting the urgent need for a global effort to diversify genomic research.  

Shifting the conversation

This all together brings us to the central paradox of polygenic scores – they are simultaneously a promising tool in specific contexts and a flawed instrument for general use. The path forward is not to rush into widespread practice, nor is it to abandon the technology. Instead, it requires a fundamental shift in the conversation, from simply developing more scores to rigorously evaluating their real-world utility in defined, appropriate use cases.

The critical task for the scientific and medical communities is to let thorough evaluation and robust ethical debate determine where these scores can be most useful. This means prioritising research that generates evidence required for evaluations of clinical utility and addressing the challenges of reliability and equity. By proceeding with caution and scientific integrity, we can begin to identify the specific clinical contexts where this potentially useful tool (which is still under development) can be used effectively, ethically, and for the benefit of all.