Drawing the curve: Data visualization and COVID-19
A pandemic year in graphics
By Adrián Blanco and Javier Sauras
When Wuhan’s municipal health commission issued an urgent notice on December 30, 2019 asking local hospitals to report and track cases of “pneumonia of unknown cause,” few newspapers were paying attention. But in Hong Kong, where the news invoked memories of the 1997 avian influenza and the Severe Acute Respiratory Syndrome (SARS) of 2003, health authorities and media professionals reacted quickly. The city’s department of health alerted frontline medical staff and increased temperature screenings at every border checkpoint. Meanwhile, Hong Kong’s press began scanning social media for clues about the disease and reaching out to correspondents in mainland China. Soon, marquee international media companies with a heavy presence in China, such as The New York Times, BBC, The Guardian, Financial Times, and Washington Post followed suit.
As reporters learned more about the behavior and consequences of the virus, the data they tracked and the ways they published them changed. With China being the epicenter of the pandemic, the first data pieces were heavily focused on China. South China Morning Post was one of the first media outlets to explain the pandemic using a visual angle.
Source: South China Morning Post Graphics, Jan. 21, 2020
Meanwhile, in the U.S., a small team at John Hopkins University was the first to start collecting data and numbers of the global pandemic. They took the lead trying to put the numbers in context and quickly became the go-to source for reporters for data on cases and deaths. But data journalists had to be cautious when using the first iterations of such publicly available data due to errors and reporting inconsistencies between countries. Yet in the first months of the pandemic, almost all data trackers, graphics, and stories were fed by John Hopkins University data. But manual labor and inconsistency issues required extra verification and led to human errors, which made some newsrooms cautious against using the first few iterations of such publicly available data.
By the time the World Health Organization officially declared a global pandemic, the Western press was already publishing data trackers to show cases, deaths, and spread both on a global scale and at the national level. Data trackers flourished in the online front pages of newspapers and media outlets, helping readers understand the pandemic from a visual perspective. In the U.S., The New York Times was the first newsroom to create their own database down to the county level. There were also trackers that collected data not by location but by demographics. The Marshall Project gathered information and statistics on coronavirus cases in prisons to keep facilities accountable. As the multimedia editor of a large international media conglomerate told us, “the COVID-19 pandemic was, perhaps, the single biggest data journalism story of our generation.”
Data visualization journalism is an interdisciplinary practice that combines, at the very least, research, storytelling, statistical analysis, and design skills. Its main purpose is to communicate information clearly and effectively with graphics. And never had data visualization been so meaningful as it became in 2020. For the first time in history, millions of people around the globe tracked line graphs and column charts every day, perused over heat maps, and followed spikes and dips with unprecedented interest. Before the COVID-19 outbreak, concepts like “flattening the curve,” “new wave,” or “R number,” were only privy to a cadre of experts. Now, they have almost become colloquial.
In 2020 we learned that it’s easier to talk about flattening the curve if we have seen it represented before; to grasp the full meaning of the “R number” and the herd immunity threshold if they are paired with graphics. In a year when the evolution of a global virus monopolized the news cycle, many media outlets began to put data visualizations front and center of their daily operations.
To understand the outsized role of data visualizations on media and journalism over the past year, we interviewed over 50 journalists, developers, designers, editors, and scholars who were either covering the pandemic or studying that coverage. We consulted with media professionals working for 40 different organizations in 20 countries spanning six continents. We questioned our interviewees about the primary data sources they relied upon; their inspiration and influences; the variables, measures, trackers and graphics they used; their most pressing challenges and successes; and the lessons they would apply in the future. Our forthcoming report explores that evolution through the experiences of its protagonists. It also studies the various strategies, graphs, and trackers media professionals followed to report on the pandemic.
Location: Physical proximity to a pandemic hot spot played a crucial role in our interviewees’ work. Although they mainly worked from home in 2020, the closer journalists were to the virus, the earlier they reported on it. And the earlier they started reporting on COVID-19, the more original and influential to other media professionals their work was.
The rise of local data: Our interviewees reported that audiences preferred local and national stories over international features. Their news organizations’ primary sources were also usually national statistics agencies.
Uncertainty: COVID-19 was an unexpected news event with very little modern precedent for newsroom coverage. With that, one common feeling shared by journalists was the lack of knowledge and understanding of a global pandemic at the beginning of it. Almost overnight, they had to adapt themselves and learn about the virus while they were also covering it.
Scarcity and redundancy: From the beginning, scarcity and centralization of reliable sources caused almost every international, national, and even local media outlet to resort to the same data pool. Publishing the same data from the same sets of information led to redundancy and homogeneity, resulting in a kind of formulaic coverage.
Automation: Those who were able to automate their trackers and visualizations found more time to produce original reporting and pursue investigative stories.
Simplicity: Explaining the pandemic meant explaining something new for everyone. Most journalists interviewed for this report agreed on the importance of visualizing in a simple but effective way, such as with bar charts, weekly average lines, and bubble maps. These also became the most popular types of visualizations.
Experts: Many interviewees underscored the importance of having reliable, steady epidemiologists and statisticians on call before the pandemic. Experts were important in understanding the data and analyzing it, which gave journalists more time and freedom to push back against formulaic reporting.
Expressing error: The gaps, the caveats, and the limited availability of data required reporters to be cautious. Many reported that they made a point to include the possibility of error and variability within their visualization. At the same time, not all media organizations and researchers presented the margin of error, often citing a cost-benefit analysis of providing vital information over perfectionism.
Professional Growth: Many data journalists and visualizers reported feeling more valued today inside and outside their newsrooms than before the pandemic began. They believed their work was more important and better understood in 2020 than years prior.
Findings on data sources
With the exception of those based in East Asia, most of the professionals we interviewed were following the evolution of the coronavirus spread through major US and UK-based media outlets. Johns Hopkins University (JHU), the European Centre for Disease Prevention and Control (ECDC), wire services and Twitter were also preferred sources. The first data many newsrooms used came from the JHU dashboard, a regularly updated tracker that stood out because “it was well sourced and verifiable with sources on the ground,” said an editor based in Qatar. However, none of these sources was entirely reliable.
Some journalists quickly realized that even data from the most trustworthy sources, such as JHU and the ECDC, had issues with accuracy and timeliness, which limited their use. But many others didn’t recognize these problems or couldn’t find suitable alternatives.
Even as outlets and research organizations began to build their own trackers, hardly anyone approached primary sources like doctors and hospitals to produce original databases. Our interviewees cited time constraints, and lack of skills and resources as reasons why data sourcing was narrow in the early days of the pandemic.
Most coronavirus data trackers include cases and deaths variables to show the spread of the virus. These types of visual data trackers became popular early on because they were simple to follow. But across the world, cases (and therefore charts and graphics) were not always accurate because of the variation in availability of tests and vast underreporting in many countries. Deaths caused by the virus were not always precise due to the lag on reporting them and the unnoticed deaths at the beginning of the pandemic.
Trackers that followed cases and death were more popular than those that measured hospitalizations or incidence rates. By nature, this data is more complex to measure and explain, making it harder to represent visually. And despite hospitalizations being one of the best ways to explain the evolution of the pandemic and its risks, this data wasn't widely available until much later in the pandemic.
Covering the pandemic—one of the most unprecedented news events in modern memory—was a tremendous challenge for data professionals all over the world. It was also an opportunity for data and visual journalism to populate the homepages of nearly every single media outlet across the globe. As the demand and need for data visualizations of the pandemic grew, so too did this type of journalism’s influence. To explain the pandemic in accessible ways, reporters developed a very particular visual language summarizing the spread of the virus, while also explaining risks and distilling new terms such as “herd immunity” and “flattening the curve.” The recognition and importance that analysis and visualization has received during the pandemic in media organizations should become a powerful argument to tell other data heavy stories in the future.
Editor’s Note: A full version of this report is set to be released through the Tow Center later this summer.