16 February 2017
Machine learning for beginners: view the full infographic here.
From tailoring our web searches and targeting advertisements, to detecting spam or fraudulent activity and even enabling driverless cars; over the last decade or so machine learning has infiltrated many aspects of our lives, often in ways that we have come to take for granted.
Now there is burgeoning excitement around its potential utility in medicine. Some paint a future in which machine le arning will diagnose disease and displace the work of doctors. But others question the extent and pace of any disruption in healthcare. So when it comes to actual impact on healthcare, where do we draw the line between hype and reality?
Let’s start with why. A vast am ount of health-related data is now generated and collected both within and outside the health system, including medical records, medical images, data from wearables and monitoring devices. The general consensus is that if effectively integrated these ‘big’ datasets can generate new knowledge to help inform and improve healthcare delivery. But doing so not only requires the ability to collect and maintain large datasets, but also to analyse and interpret them. Otherwise data remains data, rather than actionable information and knowledge. And this is where machine learning – a form of artificial intelligence (A.I.) - comes in.
Broadly speaking machine learning is a statistical approach for analysing ‘big’ datasets. In contrast to most computer algorithms – where the computers are instructed by static ‘man-made’ rules to perform a task, machine learning algorithms iteratively learn from data to discover their own rules and in doing so can improve with experience - for example, learning our shopping preferences from data on previous purchases. A main objective of machine learning is to perform predictions on previously ‘unseen’ data, based on what has been ‘learned’ from previous data. To take one specific example machine learning is being used to analyse large scale electronic medical records to learn how physicians treat patients. The algorithms can then be used to predict - and raise an alert- if a medicine may have been prescribed in error.
A plethora of machine learning applications for healthcare are under development, including risk prediction tools (for patient hospital readmission among others); and automated analytics in medical imaging for diagnostic or prognostic assessment. DeepMind -Google’s A.I. division- are collaborating with the NHS to develop algorithms to help promote earlier detection of acute kidney disease, and in a separate project - eye disease. For now, generally the most promising applications of machine learning are emerging in narrow use cases, involving relatively structured datasets like images. When it comes to harnessing insight from more unstructured, heterogeneous datasets there is still some way to go. But arguably it’s the use of machine learning on precisely these vast datasets that could be most transformative. The outcomes being anything from identifying new disease biomarkers, better predicting disease risk, improving diagnostic accuracy, establishing prognosis, and personalising care and prevention.
Realising the huge potential of machine learning in healthcare depends on data and its availability, quantity, quality, format, and integration with other sources. This is because data is fundamental to developing and optimising the algorithms in the first place. For example in order to learn to detect spam emails, the algorithms underpinning spam filters need to have previously observed spam and authentic emails- to learn to discriminate between them. The same principles apply when it comes to predicting health outcomes. In order to function accurately the algorithms need to be developed on datasets representative of the populations they are intended to serve. Otherwise they may result in incorrect predictions (e.g. a wrong diagnosis) for individuals from population groups underrepresented in datasets used to develop the algorithms.
So in developing robust, accurate and powerful algorithms all considerations about data come under the spotlight: What data are needed and why? What format and standards should they be captured in? How to feasibly obtain them? Who will be using them to develop or apply the algorithms?
For now, generally the most promising applications of machine learning are emerging in narrow use cases, involving relatively structured datasets like images. When it comes to harnessing insight from more unstructured, heterogeneous datasets there is still some way to go.
When it comes to collating data generated within the health system, one of the major practical challenges is the need for health systems to adopt digital record keeping so that data is available in a format amenable to machine learning. As things stand NHS Trusts are at different stages of digital maturity, with some Trusts probably requiring until 2023 to fully adopt digital record keeping. Beyond the health setting there is also a burgeoning volume of potentially relevant information being generated from mobile apps and self-monitoring devices but progress in aggregating and integrating these datasets is slow. Some of this is down to structural and technical challenges, but social and regulatory factors also play a role. In particular, public trust for sharing health data is likely to influence the pace of developments. Currently machine learning expertise and advanced computing resources to process data, typically reside in private corporations - arguably the same organisations with which the public have the greatest reservations about sharing their health data. However without cross discipline (healthcare and artificial intelligence) and cross sector (public and private) collaboration the full impact of machine learning on healthcare is unlikely to materialise.
From virtual nurses, drug discovery, remote patient monitoring- the diversity of machine learning applications under development is truly remarkable. But ultimately getting the most out of this technology will continue to require a great deal of bridging – of both datasets and of expertise.