MIMIC Chest X-Ray Database to Provide Researchers Access to over 350,000 Patient Radiographs

A new database of images could pave a path for algorithmic models that ensure accurate diagnoses of conditions like pneumonia.

MIMIC Chest X-Ray
Researchers have released a repository of more than 350,000 detailed chest X-rays, which is free and open to academic, clinical, and industrial investigators. Image courtesy of the researchers.

Computer vision, or the method of giving machines the ability to process images in an advanced way, has been given increased attention by researchers in the last several years. It is a broad term meant to encompass all the means through which images can be used to achieve medical aims. Applications range from automatically scanning photos taken on mobile phones to creating 3-D renderings that aid in patient evaluations on to developing algorithmic models for emergency room use in underserved areas.

As access to a greater number of images is apt to provide researchers with a volume of data ideal for developing better and more robust algorithms, a collection of visuals that have been enhanced, or scrubbed of patients’ identifying details and then highlighted in critical areas, can have massive potential for researchers and radiologists who rely on photographic data in their work.

Last week, the MIT Laboratory for Computational Physiology, a part of the Institute for Medical Engineering and Science (IMES) led by Professor Roger Mark, launched a preview of their MIMIC-Chest X-Ray Database (MIMIC-CXR), a repository of more than 350,000 detailed chest X-rays gathered over five years from the Beth Israel Deaconess Medical Center in Boston. The project, like the lab’s previous MIMIC-III, which houses critical care patient data from over 40,000 intensive care unit stays, is free and open to academic, clinical, and industrial investigators via the research resource PhysioNet. It represents the largest selection of publicly available chest radiographs to date.

With access to the MIMIC-CXR, funded by Philips Research, registered users and their cohorts can more easily develop algorithms for fourteen of the most common findings from a chest X-ray, including pneumonia, cardiomegaly (enlarged heart), edema (excess fluid), and a punctured lung. By way of linking visual markers to specific diagnoses, machines can readily help clinicians draw more accurate conclusions faster and thus, handle more cases in a shorter amount of time. These algorithms could prove especially beneficial for doctors working in underfunded and understaffed hospitals.

“Rural areas typically have no radiologists,” says Research Scientist Alistair E. W. Johnson, co-developer of the database along with Tom J. Pollard, Nathaniel R. Greenbaum, and Matthew P. Lungren; Seth J. Berkowitz, director of radiology informatics innovation; Chih-ying Deng of Harvard Medical School; and Steven Horng, associate director of emergency medicine informatics at Beth Israel. “If you have a room full of ill patients and no time to consult an expert radiologist, that’s somewhere where a model can help.”

In the future, the lab hopes to link the X-ray archive to the MIMIC-III, thus forming a database that includes both patient ICU data and images. There are currently over 9,000 registered MIMIC-III users accessing critical care data, and the MIMIC-CXR would be a boon for those in critical care medicine looking to supplement clinical data with images.

Another asset of the database lies in its timing. Researchers at the Stanford Machine Learning Group and the Stanford Center for Artificial Intelligence in Medicine and Imaging released a similar dataset in January, collected over 15 years at Stanford Hospital. The MIT Laboratory for Computational Physiology and Stanford University groups collaborated to ensure that both datasets released could be used with minimal legwork for the interested researcher.

“With single center studies, you’re never sure if what you’ve found is true of everyone, or a consequence of the type of patients the hospital sees, or the way it gives its care,” Johnson says. “That’s why multicenter trials are so powerful. By working with Stanford, we’ve essentially empowered researchers around the world to run their own multicenter trials without having to spend the millions of dollars that typically costs.”

As with MIMIC-III, researchers will be able to gain access to MIMIC-CXR by first completing a training course on managing human subjects and then agreeing to cite the dataset in their published work. 

“The next step is free text reports,” says Johnson. “We’re moving more towards having a complete history. When a radiologist is looking at a chest X-ray, they know who the person is and why they’re there. If we want to make radiologists’ lives easier, the models need to know who the person is, too.”