News

UChicago data scientists update powerful cancer data sharing platform

The UChicago Center for Translational Data Science has completed an update to the National Cancer Institute’s Genomic Data Commons.

The University of Chicago Center for Translational Data Science (CTDS) has completed an update to the National Cancer Institute’s Genomic Data Commons, a repository and computational platform for cancer researchers who seek to understand cancer, its clinical progression and response to therapy. This new version is known as GDC 2.0.

Designed by the University’s CTDS, the NCI’s GDC launched in 2016. The CTDS operates and updates the data commons via funding from a subcontract with Leidos Biomedical Research, Inc. (Leidos Biomed), operator of the Frederick National Laboratory for Cancer Research on behalf of the NCI. Funding is provided under Leidos Biomed’s Prime Contract 75N91019D00024.

Among the enhancements is the GDC Analysis Tool Software Development Kit (SDK). “The SDK makes it easy for the GDC to integrate the best tools developed by the cancer genomics research community,” said Robert Grossman, PhD, Frederick H. Rawson Distinguished Service Professor in Medicine and Computer Science, who leads the GDC program at the University’s CTDS. “By using the SDK, developers have a clear path to easily integrate their tools with the GDC, so they are available to the over 90,000 researchers who use it each month.”

Additional updates completed by the CTDS include:

  • A cohort-centric workflow allowing users to build and save custom cohorts to use across GDC tools
  • New analysis and visualization tools (in addition to pre-existing tools) developed by a separate Leidos Biomed Subcontractor:
    • Gene expression clustering to visualize gene expression levels for a custom cohort and gene set
    • ProteinPaint: to visualize somatic mutations as they appear on a gene or polypeptide
    • Sequencing reads viewer: Display the reads within a GDC-harmonized alignment
    • Cohort-level MAF tool: Aggregate somatic mutations into one file for a custom cohort
  • Tool-based framework - Seamlessly move a cohort across analysis tools in GDC’s new analysis center

The GDC contains data generated by the NCI from some of the largest and most comprehensive cancer genomic datasets, including the Cancer Genome Atlas and Therapeutically Applicable Research to Generate Effective Therapies. The platform accepts data from cancer programs and individual researchers. Information is processed using standardized bioinformatics pipelines to align the data and to generate higher-level data such as variant calls and expression quantification. As more researchers add clinical and genomic data to the GDC, it will become an even more powerful tool for discoveries about the molecular basis of cancer that may lead to better care for patients.

For more information, visit the GDC Data Commons Portal 2.0.

Explore the Biological Sciences Division