In March 2020, as hospitals were preparing for the first wave of COVID-19 patients, many were worried about possible bed shortages. To illustrate hospital capacity across the country, I worked with Sam Liss, a Healthcare Dive reporter, to analyze data on population and hospital bed counts in major U.S. metro areas.

Sam and I collaborated throughout the entire process — from inspecting the dataset and brainstorming story ideas to reporting and visualizing our findings. I’ll be going through each step of the process in this post, but first, take a moment to read the article.

Finding the data

Like most data stories, this one started with a dataset. Greg Linch, an Industry Dive software engineer, shared with us a dataset on U.S. hospitals, created by the Accountability Project. It included data on the number of available beds in each hospital, broken down by bed type (e.g. ICU, coronary care).

Soon after receiving the dataset, Sam and I hopped on a call to dig through the data and read the documentation together. I brought my data expertise, sharing possible analyses and metrics. Sam brought her expertise on the industry and Healthcare Dive’s audience, helping me understand what would resonate with readers.

As we discussed, lots of questions came up:

  • How was the data collected (and can we rely on it)?
  • Is this a dataset of all hospitals or is it restricted to certain types?
  • How were the bed types (e.g. ICU, coronary care) assigned, and do we agree with this categorization?
  • What is the best way to group these hospitals to be useful and impactful?
  • What is the best metric to compare population and number of available beds? Other news organizations have used “beds per 1,000 people”, but is there a better metric?

We needed to answer these questions before moving forward with a story. Sam would contact the Accountability Project to better understand their data collection process, and I would start data wrangling.

Transforming the data

The dataset included over 4,000 hospitals across the US. For each hospital, there was basic information, like name and location, along with bed counts, broken down by bed type (e.g. ICU, coronary care).

Original dataset from Public Accountability. Screenshot of the first few rows and columns of the dataset from the Accountability Project

Finding the right grouping

Because we wanted to identify at-risk regions, we needed to define how we would group the 4,000+ hospitals.

I initially considered grouping by city, but that felt too granular. Hospitals in adjacent cities may serve the same population; therefore, capacity for those hospitals should be calculated together. Grouping by state resulted in too much generalization: hospitals don’t usually serve an entire state’s population, so a state average would provide little value. Lastly, I considered grouping by county. It had the right level of specificity, but I worried that people wouldn’t recognize county names, which would make the results less meaningful.

In the end, I grouped by core-based statistical area (CBSA). A CBSA is defined by the U.S. Office of Management and Budget (OMB) as an area including an urban center of 10,000+ people, plus the surrounding counties that are socioeconomically tied to the urban center by commuting. We then subsetted the data to only include metropolitan statistical areas — a group of 392 largest CBSAs.

Metropolitan CBSAs, or metro areas, had the right level of granularity. They are also easily identifiable because most are centered around large cities with which people are familiar.

Dataset after grouping and filtering. Screenshot of the first few rows and columns of the resulting dataset after grouping and filtering

Finding the right metric

Next, I needed to select a metric to describe hospital capacity relative to population. I could then rank the CBSAs by this metric to determine the CBSAs most at risk of bed shortages.

I evaluated three different metrics:

  • People per bed: The ratio of population to beds (i.e. population divided by the number of beds). For example, the New York City metro area has 405 people per 1 bed.
  • Beds per person: The ratio of beds to population (i.e. the number of beds divided by population). For example, the New York City metro area has 0.0025 beds per person.
  • Beds per 1,000 people: The ratio of beds to 1,000 people (i.e. the number of beds divided by population, multiplied by 1,000). For example, the New York City metro area has 2.5 beds per 1,000 people.

The first metric, people per bed, seemed the easiest to understand because it deals with whole numbers of people and beds. Moreover, the crux of the story was around many coronavirus patients vying for a small number of hospital beds; this metric conveyed just that.

Finding the story

Sam and I regrouped a few days later to share our findings. Sam confirmed the validity of key variables, alleviating any concerns we had about the dataset. I shared my exploration of groupings and metrics, suggesting CBSAs and the people-per-bed ratio.

With these data decisions finalized, we started planning the visualization and reporting.

For the visualizations, we decided on three main graphics:

  • A map showing the top 10 metro areas with the highest people-per-bed ratios — because this was our main finding
  • A visualization of hospital capacity in the most populated metro areas — because these were the areas that had the most COVID-19 cases at the time
  • A table at the end of the story with data for all metro areas — because we anticipated readers wanting to know the capacity in their own metro area

On the reporting side, we decided on the following:

  • Most readers will be unfamiliar with CBSAs, so we would need to pull the definition from the U.S. Census Bureau. We also decided to use the term “CBSA” on first reference/in our definition, but switch to the colloquial term “metro area” for the rest of the story.
  • Similarly, we would need to craft a good explanation of the people-per-bed ratio, with the help of some graphics.
  • Sam will complement the data findings with interviews with healthcare professionals and experts in areas with the highest people-per-bed ratios.

With a clear idea of the final story, I started finalizing the dataset and creating the graphics. Sam began interviewing and writing.

It was the first time Sam and I collaborated so closely on a story, and in the end, we were able to produce a great piece of data journalism. Little did we know that we would collaborate on several more data stories throughout 2020, including stories on how much CARES funding went to nonprofit and for-profit healthcare systems.