The Challenge of the Exposome: How to Determine Cause-Effect Relationships Among Hundreds or Thousands of Data?

07.12.2022

Photo: Joshua Sortino / Unsplash

"Your assumptions are your windows on the world" (Isaac Asimov)

Apparently there is a significant correlation between the number of people who drowned in a pool and the number of films starring Nicolas Cage. Maybe these people really did not enjoy his movies, although most probably this is just a spurious relationship. That is, these two events are indeed associated, but not in a causal way. Watching a movie with Mr. Cage won’t make you drown. And there are many such relationships. In this case, luckily enough, whether we interpret it as causal or not, it will probably have no relevant consequences. Except, perhaps, for Cage's finances.

But suppose instead that we want to research the effects of exposure to a chemical, like a possible carcinogen or a new drug. Well, now our interpretation of the results might have far more catastrophic consequences. A chemical deemed as safe might actually cause serious health effects, or a drug which did not show any “statistically significant” results might in fact be beneficial. In both cases, there is the concrete possibility that a large proportion of the population will suffer. Establishing causality is not just recommended, it’s imperative.

Most epidemiologists are aware of the phrase “correlation does not imply causation”, nowadays recited almost as a mantra. Yet, most of us are almost scared of stating the causal objective of our scientific efforts

Interestingly, most epidemiologists are aware of the phrase “correlation does not imply causation”, nowadays recited almost as a mantra. Yet, most of us are almost scared of stating the causal objective of our scientific efforts. Indeed, if you randomly pick an epidemiological paper, there’s a good chance that the authors concluded the manuscript by writing something along the lines of “Due to the observational nature of our study, we cannot establish causality.”. And unfortunately, in some cases, the authors did not make much effort in an attempt to establish causality. As Miguel A. Hernan writes: “Associational questions are easy to formulate and straightforward to answer when data are available”.

A common cause of spurious associations is what we call confounding. A confounder can be described as an event or a variable that is associated with both the exposure (e.g., our chemical) and the outcome (e.g., cancer). Confounders can distort our results, and in some cases they can even change the direction of an effect. The good news is that there exists methods to “control” for these variables, thus reducing their influence on the effect of interest. The bad news is that selecting the right confounders, when available, for our specific question is difficult, to say the least. There is no fancy method to automatically identify them, you need subject-specific knowledge. This requires time and money, something extremely valuable in academia.

Selecting the right confounders, when available, for our specific question is difficult, to say the least. There is no fancy method to automatically identify them, you need subject-specific knowledge. This requires time and money, something extremely valuable in academia

Things get even more complicated, but surely more relevant and interesting, when we wish to study the health effects of perhaps hundreds or even thousands of exposures simultaneously. In fact, our health is determined by many aspects of our environment. The sum of all these non-genetic determinants of health is now known as the exposome. And the interest in this field of research has exploded in recent years. Many actors across the world have expressed or are expressing interest in this innovative concept. ISGlobal is currently one of the leading institutions in this field, being part of large European exposome projects like ATHLETE, Equal-Life, EXPANSE and EPHOR.

These projects are collecting massive amounts of data to link chemical, social and urban exposures to molecular responses and health outcomes. Big data are necessary to answer these complex questions, but they are not sufficient to establish causality. Big data cannot, and never will, replace careful thinking and domain-specific knowledge. And if things were complicated for one exposure and one outcome, we can only imagine the difficulties that will arise when we will try to identify the necessary confounders to establish causality for these complex effects. It is a daunting task, but it is also a necessary one.

We can only imagine the difficulties that will arise when we will try to identify the necessary confounders to establish causality for these complex effects

Causal thinking has gained popularity also in environmental epidemiology. Indeed, causal questions are what ultimately drive interventions and policy change. Unfortunately, It is now clear that the traditional statistical methods that researchers have been using for decades are not appropriate for high-dimensional data, as is the case in exposome research. Luckily enough, statisticians have developed some “smart” solutions. These modern statistical methods allow us to integrate, analyze, and interpret large amounts of data. And to obtain precise estimates of the target quantities. Applied researchers working with non-experimental data cannot pretend that these questions cannot be answered anymore. In the Exposome Data Challenge organized by ISGlobal, examples of causal inference methods included mediation analysis using omic data, g-computation methods, and the use of causal random forest. All these methods and associated code are reported in a recent article (Maitre et al. 2022 https://www.sciencedirect.com/science/article/pii/S016041202200349X#s0100).

Environmental Exposures During Pregnancy and Childhood Could Affect Blood Pressure in Children. Author: Warembourg, C. et al. J Am Coll Cardiol. 2019;74(10):1317–28.

Traditional statistical methods that researchers have been using for decades are not appropriate for high-dimensional data, as is the case in exposome research. Luckily enough, statisticians have developed some “smart” solutions

And we can do more. It is fairly common nowadays to get data from multiple sources (e.g., systematic toxicological and biological knowledge integration), often independent of one another. We can thus try to triangulate the evidence with the hope of reducing the bias and get closer to the truth. For instance, within the OBERON project, of which ISGlobal is also a partner, we are trying to study the health impacts of a certain class of chemicals based on in vitro, in silico, and epidemiological evidence.

To conclude, the output of these large exposome research projects has the potential to provide the understanding necessary to prevent the effects of a multitude of environmental hazards, starting from the earliest stages of life. You can learn more about the exposome and the leading role of ISGlobal in this field here.

Cookie	Management	Duration	Aim
COOKIE_SUPPORT	First Party	50 years	A cookie that checks whether cookies are enabled in your browser.
GUEST_LANGUAGE_ID	First Party	50 years	Cookie used to remember the user's language preferences.
JSESSIONID	First Party	Session	Cookie used to manage web applications.
LFR_SESSION_STATE	First Party	Session	Liferay uses these cookies to manage your access ID
USER_UUID	First Party	Session	Liferay uses these cookies to manage your access ID
cp_sessionid	First Party	5 years
CK_CONSENT	First Party	1 year	A cookie used to confirm the user's acceptance of first level cookies in response to the cookie warning.
COMPANY_ID	First Party	Session	Liferay company identifier
ID	First Party	Session	Liferay uses these cookies to manage your access ID
ck_fbanner	First Party	14 days	A cookie used to check if the user already close footer banner.
ck_newsletter	First Party	14 days	A cookie used to check if the user already close newsletter popup.
1P_JAR	Google	1 month	These cookies are used to collect website statistics and track conversion rates.
APISID	Google	2 years	Used by Google to store user preferences and information of Google
HSID	Google	2 years	Used by Google to store user preferences and information of Google
NID	Google	6 months	Advertising cookie used to tailor the advertisements shown to the interests of the user.
S	Google	Session	Used by Google to store user preferences and information of Google
SAPISID	Google	2 years	Used by Google to store user preferences and information of Google
SID	Google	2 years	Used by Google to store user preferences and information of Google
SIDCC	Google	3 months	Used by Google to store user preferences and information of Google
SSID	Google	2 years	Used by Google to store user preferences and information of Google
__Secure-3PAPISID	Google	2 years	Used by for targeting purposes to build a profile of the website visitor's interests in order to show relevant & personalised Google advertising
__Secure-3PSID	Google	2 years	Used by for targeting purposes to build a profile of the website visitor's interests in order to show relevant & personalised Google advertising
__Secure-3PSIDCC	Google	2 years	Used by for targeting purposes to build a profile of the website visitor's interests in order to show relevant & personalised Google advertising
CONSENT	Google	20 years	Used by Google to store user preferences and information of Google
vuid	Vimeo	2 years	A Vimeo Analytics cookie that provides a single id.
APISID	YouTube	2 years	These cookies are set via embedded YouTube videos. They register anonymous statistical data
HSID	YouTube	2 years	These cookies are set via embedded YouTube videos. They register anonymous statistical data
LOGIN_INFO	YouTube	2 years	These cookies are set via embedded YouTube videos. They register anonymous statistical data
PREF	YouTube	2 years	Cookie that remembers information that changes the appearance or behaviour of the web site, such as the user's preferred language or region.
SAPISID	YouTube	2 years	These cookies are set via embedded YouTube videos. They register anonymous statistical data
SID	YouTube	2 years	These cookies are set via embedded YouTube videos. They register anonymous statistical data
SIDCC	YouTube	1 year	Used as security measure to protect users data from unauthorised access
SSID	YouTube	2 years	These cookies are set via embedded YouTube videos. They register anonymous statistical data
VISITOR_INFO1_LIVE	YouTube	1 year	A cookie that YouTube sets that measures your bandwidth to determine whether you get the new player interface or the old.
YSC	YouTube	Session	This cookie is set by the YouTube video service on pages with embedded YouTube video
CONSENT	YouTube	20 years	Used by Google to store user preferences and information of Google
__Secure-3PAPISID	YouTube	2 years	Used by for targeting purposes to build a profile of the website visitor's interests in order to show relevant & personalised Google advertising
__Secure-3PSID	YouTube	2 years	Used by for targeting purposes to build a profile of the website visitor's interests in order to show relevant & personalised Google advertising
__Secure-3PSIDCC	YouTube	2 years	Used by for targeting purposes to build a profile of the website visitor's interests in order to show relevant & personalised Google advertising
iutk	issuu	10 years	Recognises the user's device and what Issuu documents have been read.
_initial_referrer	infogram	Session	Tells Infogram which site is embedding the content
ig_putma	infogram	Session	Session cookie

Cookie	Management	Duration	Aim
_ga	Google Analytics	2 years	This cookie is used to distinguishes unique users by assigning a randomly generated number as a client identifier. It is included in each page request in a site and used to calculate visitor, session and campaign data for the sites analytics reports.
_gat	Google Analytics	1 minute	Used to throttle the request rate – limiting the collection of data on high traffic sites.
_gid	Google Analytics	24 hours	Used to distinguish users

Cookie	Management	Duration	Aim
_gcl_au	Google	3 months	Used by Google for experimenting with advertisement efficiency
IDE	Google (doubleclick)	13 months	Used by Google to show relevant advertising to the viewer across the web
datr	Facebook	2 years	This cookie is used by facebook for security, to stop DDOS/fake accounts and to protect users content
sb	Facebook	2 years	Allow you to control the “Follow us on Facebook” and “Like” buttons, collect the language settings and allow you to share the page.
_fbp	Facebook	3 months	Store and track visits across websites.
auth_token	Twitter	10 years	Track visitor activity from Twitter ads on our website, and also allow users to share content from our websites. They cookies do not provide us with any confidential information relating to your account.
dnt	Twitter	10 years	These cookies enable users, if they wish, to login to their Twitter account share content from our websites with their friends. These cookies do not allow us access to your accounts or provide us with any confidential information relating to your accounts.
twid	Twitter	10 years	Track visitor activity from Twitter ads on our website, and also allow users to share content from our websites. They cookies do not provide us with any confidential information relating to your account.
wide	YouTube	Session	These cookies are set via embedded YouTube videos. They register anonymous statistical data
s_gl	YouTube	Session

The Challenge of the Exposome: How to Determine Cause-Effect Relationships Among Hundreds or Thousands of Data?

Cookies and privacy

Cookies Policy

Type of cookies used on the Website

First-party cookies

Third-party analytics cookies

Consent

Disabling and blocking cookies

Withdrawing your consent

Changing your browser's cookie configuration and settings

Internet Explorer:

Firefox

Chrome

Safari (and for iOS)

Changes to the Cookies Policy

Contact

Privacy Policy

Functional Strictly Necessary Cookies

Statistic / Analytical Cookies

Targeting (3rd Party) / Advertising Cookies