Electronic medical records contain a wealth of information that researchers can use to learn more about patients, such as how they respond to treatments, how they recover after medical procedures, or how various aspects of their care affect their stay in hospital.
Medical institutions have strict regulations in place to protect patient privacy, however, so not everyone can get access to these records and start digging into the data. While this is an overall good for obvious legal and ethical reasons, it can limit the ability of clinicians and researchers to tap into this valuable resource.
At the University of Chicago, a transformative collaboration among medical providers, data scientists, and information technology teams is building a unified data repository to provide faster access to data and propel advancements in healthcare delivery and research. This platform includes self-service, kiosk-style tools to work with “synthetic” data instead of real patient data to protect patient privacy while giving researchers access to information they can use.
Synthetic data, real stories
Synthetic data is artificially generated data that mimics the same characteristics and patterns of the real data without containing any real patient information. Machine learning models and simulation tools learn patterns from the real data and create new records that reflect the same variables and trends.
“The power of synthetic data is that it tells the same story as the real thing, without sacrificing patient privacy and security,” said Julie Johnson, Executive Director of the Center for Research Informatics (CRI) at UChicago. The CRI has partnered with the UChicago Medicine Center for Healthcare Delivery Science and Innovation (HDSI) and UChicago Medicine Information Technology Services (UCM IT) to launch the ADAMS Center (Ask, Discover, Act, Measure, Share). This collaborative center is building a centralized data repository that integrates clinical, quality, claims, finance, and administrative data from across the health system, and provides training and support for users to perform their own queries and view data.
The repository is powered by MDClone, a software platform that helps build data repositories and provides multilayered privacy options, including synthetic data. This approach promotes access to data for more researchers while upholding the highest standards of privacy.
One approach to protecting patient data privacy is deidentification, or removing all personally identifiable information from records, such as names, addresses, Social Security Numbers, and dates of birth. Even after following the strictest deidentification processes, however, there is still a risk that these records can be linked to real individuals, especially in the case of rare diseases or specialized treatments where the population is relatively small in the first place. Synthetic data offers an alternative, without the added risk of working with even the bare bones of a real patient’s data.
Synthetic data can also be customized to fit certain conditions, making sure it includes the right demographic mix or balance of medical conditions. Johnson says this is where the real value comes in, because researchers can use the tools to fine tune their queries and testdifferent variables without needing access to the real data set.
“When you're playing in a synthetic environment, you can play with your cohort a little bit to make sure you're asking the right question. I think that's huge,” she said. This is particularly useful for researchers as they develop project proposals and grant applications. They can access the synthetic data environment and see if a given research question is feasible, or even just to verify a hunch before requesting access to the real thing, if at all.
“If I have a feasibility question for a research study, I don't necessarily need to go into the medical record to determine if I have the right population and the right variables available to answer that question,” Johnson said. “Or, when you're writing a grant, you don't need to have the exact numbers yet. You just need the numbers to tell the story to secure the grant and make sure it's legitimate. So, synthetic data reduces risk there too.”
Improving quality of care
Clinicians and administrators across the health system have already used MDClone to optimize care processes, improve workflows, and identify high-risk patients earlier for preventive care. Financial gains can follow these improvements by reducing readmissions, reducinglow-value care, and finding patients in need of evidence-based procedures.
Andrew Davis, MD, MPH, a Clinical Professor of Medicine and early adopter of the MDClone tools, says the platform gives clinicians a rapid and powerful way to improve medical care by zeroing in on the right patients. “It really reduces our time to insights,” he said. “The synthetic data allows you to ask questions very quickly and explore hypotheses.”
For example, doctors looking to improve care for patients with heart failure could build a query to see how many patients (or their representative synthetic avatars) are taking the medications recommended by national guidelines, which are known to reduce hospital admissions and mortality. If they see a pattern that certain medications are underprescribed in primary care, or some patients are not being referred to a specialist, clinical leaders can then explore options for reaching out to real patients who match these criteria and see how they can help. Maybe the medications are too expensive, or a repeat referral needs to be made.
“Now you have the ability to understand who's receiving appropriate care, and if they're not, to look for ways to work with clinics and doctors to better educate patients and help keep them out of the hospital,” Davis said.
Davis, who was a former Associate Vice-Chair for Quality in the Department of Medicine, said these data tools are an exciting opportunity to leverage promising new technologies, while honoring clinical experience and insights.
“I’ve been doing quality improvement work for over 30 years, and this feels different,” he said. “As much as we would like to think that AI alone is going to revolutionize medicine, it needs to be done in coordination with people who understand clinical care and have a long experience working with patients to optimize their health.”