Bruno Petrungaro, Senior Econometrician and Andi Orlowski, Director of the Health Economics Unit presented at the third NHS-R Community conference in November on Causal Inference in Population Health Management. Here, Bruno explains more about the technique he championed to delegates.
Find the video of our talk below:
The conference promotes the use of the free open source data science and statistics environment, R, in the NHS. It gives those of us using it the chance to share best practice solutions to NHS problems.
Although R is not widely used within the NHS the hope is, that through the NHS-R Community we can raise the profile of R and highlight the potential it has to really make a difference when it comes to informed decision making and improving health outcomes.
The conference gave me the chance to talk about how to use R to build causal models which are less used than other Data Science techniques. There are many benefits to using causal models. Perhaps, most importantly, they enable cause-and-effect reasoning which models that focus on association between variables cannot do. They are generalisable by their nature, this means, if we have found cause-and-effect relationships those will hold under other circumstances.
Causation versus association
Causal models differ from traditional models used in machine learning. Most machine learning techniques focus on the association of variables to build models that accurately predict an outcome variable.
Causal models are powerful tools for population health management because, rather than determining what is associated with a problem, it tells you what causes the problem. Which you can then use to identify where to intervene to improve patient outcomes.
When a patient sees a doctor, the doctor builds cause-and-effect mechanisms in his mind to arrive to a diagnosis. As a species, we humans tend to think in cause-and-effect terms. We can potentially automate this cause-and-effect mechanism using Bayesian Networks in R. This would be called an expert system.
Building a Bayesian network with the PC algorithm
R has its own package (bnlearn) for learning the graphical structure of Bayesian networks, estimate their parameters and perform inference. Once the structure of a Bayesian network is specified it’s easy to quantify the relationships between connected nodes by specifying a conditional probability distribution for each node. Structure learning and the estimation of the conditional probabilities can be either knowledge-based, data-driven or a combination of both.
An example we went through in the conference of an algorithm that estimates the structure of a Bayesian network is the PC algorithm. It starts by drawing a complete undirected graph with edges drawn between all of the variables. It then performs conditional independence tests and removes edges whenever two nodes are independent.
What can Bayesian Networks do for the NHS?
The NHS Long Term Plan sets out prevention as a priority to help people stay healthy and moderate demand on the NHS. By spreading the use of causal models through the sector we can aid understanding of what is causing a patient’s symptoms, not what is associated with them. Once we have this knowledge, we can improve the way services are commissioned which will lessen the impact on NHS services.
Using Bayesian networks can support the Long Term Plan in several ways.
Risk modelling with Bayesian Networks has advantages over traditional regression-based approaches:
- Bayesian Networks allows for easy communication of the relationships between variables. This is not possible in traditional regression-based approaches where the relationships are always defined with respect to the outcome variable (what we are trying to predict/understand)
- Bayesian Networks have the ability to estimate individual risk via Bayes’s theorem
- Bayesian Networks can be augmented to Bayesian Decision Networks to enable us to understand where to intervene to reduce patient risk
Bayesian Networks can incorporate expert knowledge and current diagnostic criteria to automate diagnostic decisions. This could be used in a primary care setting to minimise outpatient appointments.
We know that many long-term conditions manifest themselves together in some patients. Bayesian networks allows us to identify what drives co-morbid cases, identifying symptoms that acts as “bridges” for the two long-term conditions
All of these examples could have a tangible impact on the NHS’ strategy to ensure patients get the care they need, in a timely manner and in the most appropriate setting to help shape the NHS of the future.
You can watch a playback of the presentation below.
For more information on using R for causal inference contact Bruno at firstname.lastname@example.org