Asset Publisher
javax.portlet.title.customblogportlet_WAR_customblogportlet (Health is Global Blog)

The Challenge of the Exposome: How to Determine Cause-Effect Relationships Among Hundreds or Thousands of Data?

07.12.2022
joshua_sortino-unsplash
Photo: Joshua Sortino / Unsplash

"Your assumptions are your windows on the world" (Isaac Asimov)

Apparently there is a significant correlation between the number of people who drowned in a pool and the number of films starring Nicolas Cage. Maybe these people really did not enjoy his movies, although most probably this is just a spurious relationship. That is, these two events are indeed associated, but not in a causal way. Watching a movie with Mr. Cage won’t make you drown. And there are many such relationships. In this case, luckily enough, whether we interpret it as causal or not, it will probably have no relevant consequences. Except, perhaps, for Cage's finances.

But suppose instead that we want to research the effects of exposure to a chemical, like a possible carcinogen or a new drug. Well, now our interpretation of the results might have far more catastrophic consequences. A chemical deemed as safe might actually cause serious health effects, or a drug which did not show any “statistically significant” results might in fact be beneficial. In both cases, there is the concrete possibility that a large proportion of the population will suffer. Establishing causality is not just recommended, it’s imperative.

Most epidemiologists are aware of the phrase “correlation does not imply causation”, nowadays recited almost as a mantra. Yet, most of us are almost scared of stating the causal objective of our scientific efforts

Interestingly, most epidemiologists are aware of the phrase “correlation does not imply causation”, nowadays recited almost as a mantra. Yet, most of us are almost scared of stating the causal objective of our scientific efforts. Indeed, if you randomly pick an epidemiological paper, there’s a good chance that the authors concluded the manuscript by writing something along the lines of “Due to the observational nature of our study, we cannot establish causality.”. And unfortunately, in some cases, the authors did not make much effort in an attempt to establish causality. As Miguel A. Hernan writes: “Associational questions are easy to formulate and straightforward to answer when data are available”.

 

 

A common cause of spurious associations is what we call confounding. A confounder can be described as an event or a variable that is associated with both the exposure (e.g., our chemical) and the outcome (e.g., cancer). Confounders can distort our results, and in some cases they can even change the direction of an effect. The good news is that there exists methods to “control” for these variables, thus reducing their influence on the effect of interest. The bad news is that selecting the right confounders, when available, for our specific question is difficult, to say the least. There is no fancy method to automatically identify them, you need subject-specific knowledge. This requires time and money, something extremely valuable in academia.

Selecting the right confounders, when available, for our specific question is difficult, to say the least. There is no fancy method to automatically identify them, you need subject-specific knowledge. This requires time and money, something extremely valuable in academia

Things get even more complicated, but surely more relevant and interesting, when we wish to study the health effects of perhaps hundreds or even thousands of exposures simultaneously. In fact, our health is determined by many aspects of our environment. The sum of all these non-genetic determinants of health is now known as the exposome. And the interest in this field of research has exploded in recent years. Many actors across the world have expressed or are expressing interest in this innovative concept. ISGlobal is currently one of the leading institutions in this field, being part of large European exposome projects like ATHLETE, Equal-Life, EXPANSE and EPHOR.

 

 

These projects are collecting massive amounts of data to link chemical, social and urban exposures to molecular responses and health outcomes. Big data are necessary to answer these complex questions, but they are not sufficient to establish causality. Big data cannot, and never will, replace careful thinking and domain-specific knowledge. And if things were complicated for one exposure and one outcome, we can only imagine the difficulties that will arise when we will try to identify the necessary confounders to establish causality for these complex effects. It is a daunting task, but it is also a necessary one.

We can only imagine the difficulties that will arise when we will try to identify the necessary confounders to establish causality for these complex effects

Causal thinking has gained popularity also in environmental epidemiology. Indeed, causal questions are what ultimately drive interventions and policy change. Unfortunately, It is now clear that the traditional statistical methods that researchers have been using for decades are not appropriate for high-dimensional data, as is the case in exposome research. Luckily enough, statisticians have developed some “smart” solutions. These modern statistical methods allow us to integrate, analyze, and interpret large amounts of data. And to obtain precise estimates of the target quantities. Applied researchers working with non-experimental data cannot pretend that these questions cannot be answered anymore. In the Exposome Data Challenge organized by ISGlobal, examples of causal inference methods included mediation analysis using omic data, g-computation methods, and the use of causal random forest. All these methods and associated code are reported in a recent article (Maitre et al. 2022 https://www.sciencedirect.com/science/article/pii/S016041202200349X#s0100).

 

Environmental Exposures During Pregnancy and Childhood Could Affect Blood Pressure in ChildrenAuthor: Warembourg, C. et al. J Am Coll Cardiol. 2019;74(10):1317–28.

Traditional statistical methods that researchers have been using for decades are not appropriate for high-dimensional data, as is the case in exposome research. Luckily enough, statisticians have developed some “smart” solutions

And we can do more. It is fairly common nowadays to get data from multiple sources (e.g., systematic toxicological and biological knowledge integration), often independent of one another. We can thus try to triangulate the evidence with the hope of reducing the bias and get closer to the truth. For instance, within the OBERON project, of which ISGlobal is also a partner, we are trying to study the health impacts of a certain class of chemicals based on in vitro, in silico, and epidemiological evidence.

To conclude, the output of these large exposome research projects has the potential to provide the understanding necessary to prevent the effects of a multitude of environmental hazards, starting from the earliest stages of life. You can learn more about the exposome and the leading role of ISGlobal in this field here.