El CISM, 25 años de datos: de pocos megas a muchos gigas

25 Years of Data at CISM: From a Few Megabytes to Many Gigabytes

08.10.2021
cism post datos 1.jpg
Photo: First CISM field workers using GPS technology, together with Francisco Saúte, who is currently the director of the center.

[This blog post is a one of a series of articles to commemorate the 25th anniversary of CISM].

 

So much has changed since my first trip to Manhiça, Mozambique, more than 25 years ago, in January 1996. I travelled there with a state-of-the-art laptop weighing 4.5 kg, equipped with Windows 95 and a 520 MB hard drive—that’s right, half a gigabyte. Back in those days, it was a high-end machine. As a statistician, I was incredibly lucky to have the opportunity to participate in a study of maternal mortality in rural sub-Saharan Africa, focusing on a difficult-to-reach population of unknown size.

It was a challenge that, unbeknownst to me, harboured a great reward: I would discover that statistics is about more than simply analysing data. Data do not just fall from the sky; obtaining quality data can, in fact, be quite complicated. Moreover, the way in which your data are obtained, the way in which they are processed, and your knowledge of the environment are all essential to properly interpreting your findings.

So much has changed since my first trip to Manhiça. I would discover that statistics is about more than simply analysing data

Paul S. Levy’s Sampling of Populations—the bible of statistical sampling—had made the journey in my suitcase, and it would prove to be a great ally. The time had come to put into practice the lessons of that book, which I had read and re-read many times. I vividly remember meeting with the chefes de bairro at the district headquarters to find out how many agregados they were responsible for. Amid considerable confusão, we managed to sketch out all the neighbourhoods on a blackboard. Our only points of reference were two poorly drawn lines representing the Inkomati River and the railway track.

That blackboard would become our first map of the study area. Ephemeral, yes, but useful for our purposes. We developed a double-data-entry programme, hired and trained three data-entry clerks, and purchased two computers and a printer—the origin of what we now know as the data centre at the Manhiça Health Research Centre (CISM).

First map of the study area drawn on a blackboard.

But that was just the beginning. A few months later, CISM started taking a census of the population, digitising aerial photos, mapping each and every house using GPS, and creating records of the hospital’s activity, which were practically non-existent at that time. A new life began with portuñol—a hybrid of Portuguese and Spanish—as the primary language of communication: a life filled withinquéritos, números de identificação permanente, números da amostra, inominados, and the much feared and despised data cleaning lists.

Little could I imagine today’s hustle and bustle of researchers, technicians, field workers, motorcyclists, guards... all the laboratories, microscopes, freezers and, most notably, a data centre equipped with more than 20 computer terminals and numerous tablets, 20 data-entry clerks, and more than 200 databases totalling approximately 270 GB; more than 350,000 people registered with the census (including more than 29,000 deaths and nearly 114,000 emigrations); more than 1,000 homes mapped; and more than 1,400,000 consultations and 69,000 hospital admissions recorded.

First data-entry clerks to join the team, along with the first computers and a printer at CISM.

The world of information and communication has undergone a veritable technological revolution over the past 25 years and CISM has taken advantage of this. Far from becoming complacent and lagging behind, the centre has evolved, undertaking improvements and expansions that have increased its capacity to generate, manage and share information. Remote access to data now allows us to collaborate more fluidly, efficiently and uninterruptedly. Moreover, the use of electronic data capture systems enables better management and provides higher data quality, although it also requires more complex preparations, which has forced us to adopt a different work system, both at ISGlobal and at CISM. However, these advantages are accompanied by certain risks, such as threats to privacy, which have forced us to create effective data protection and access mechanisms.

The world of information and communication has undergone a veritable technological revolution over the past 25 years and CISM has taken advantage of this

Anifa Filipe Nhabatanga—also known as 93c76ca0e411ff0b08791656ecd92c0f, thanks to data protection laws—is eight years old. She was born in Taninga on 6 June 2013 at 39 weeks gestation after a perfectly routine delivery, weighing 3 kg. We didn’t learn her given name until a few months later, so initially she was an inominada. She is the third of four siblings, the youngest of whom died of malaria at six months of age after two days in hospital. Anifa and her family used to live in a house built from reeds with no latrine. When she was 18 months old, the family moved to Cambeve, where they lived in a reed house with a tin roof and its own latrine, located less than 1 km from Manhiça hospital.

In the first years of her life, Anifa’s health was marked by respiratory problems, diarrhoea and the occasional bout of mild malaria; she was seen by a doctor at the hospital 20 times and was admitted twice, once for rotavirus diarrhoea at nine months of age and once for radiologically confirmed pneumonia at two years of age.

CISM, 2007 (Photo: BMGF)

This story is fiction, but it reflects reality and illustrates the power of combining multiple data sources for epidemiological surveillance and the contextualisation of information. The three platforms—demographic census,morbidity surveillance and geographic area—are the tools that enable us to reconstruct the story of Anifa and all the other crianças under 15 years of age living in the study area. Together, these platforms form the basis for much of CISM’s research work.

Demographic census, morbidity surveillance and geographic area are the tools that enable us to reconstruct the story of Anifa and all the other "crianças" under 15 years of age living in the study area. Together, these platforms form the basis for much of CISM’s research work

The management and proper use of this information is a constant challenge that both tests and enriches us. We have learned to automate processes that facilitate data extraction and combination, but what really matters is what we do with the information. CISM is undoubtedly the driving force behind our interest in novel statistical methods that allow us to undertake ever more complex studies accurately and effectively, thereby improving our capacity to draw valid conclusions from these ever-growing sources of data.

Campaign for the massive administration of anti-malaria drugs in 2016, carried out with 250 tablets.

Over the past 25 years, we have gone from calculating incidence rates and measures of association—with the utmost rigour, thanks to the platforms—to using cutting-edge statistical methodology to assess the impact of interventions such as vaccines against Hib, pneumococcus and rotavirus. At the same time, participation in numerous studies and clinical trials on various subjects—malaria, bacterial infections, maternal and child health, causes of death, etc.—has provided further motivation to learn new techniques for analysing and visualising information.

Countless people and circumstances have left their mark on CISM and shaped the history of the centre : friendships, unforgettable moments and hard work, but also lots of laughter and immense gratification. This is no time to make a list, but it would be unfair not to express my thanks to Pedro L. Alonso and Clara Menéndez, who, with their courage, experience, wisdom and vision of the future, joined forces with the talent and innovative capacity of John J. Aponte and made it possible for us to celebrate the existence of one of Africa’s most prestigious biomedical research centres 25 years later.