The ICGA Data Visualization Platform

The Indian Cancer Genome Atlas (ICGA) provides a comprehensive, clinically annotated, multi-omics data visualization platform that enables an integrative understanding of cancer in the Indian population.

Powered by the cBioPortal framework, the ICGA Portal allows researchers to explore and analyze genomic, transcriptomic, proteomic, and clinical datasets through an intuitive and interactive interface. The platform currently hosts datasets beginning with Indian breast cancer cohorts and will expand to additional cancer types over time.

About the ICGA Portal

The ICGA Portal offers researchers a powerful gateway for accessing processed cancer genomic dataset curated by ICGA. It enables researchers to perform exploratory and hypothesis-driven analyses while ensuring compliance with ethical, legal, and data governance frameworks.

The portal has been developed with philanthropic support from Strand Life Sciences Ltd. and reflects ICGA’s commitment to responsible data sharing in cancer research.

introduction to ICGA Data Portal

Data Availability and Ethical Compliance

ICGA adheres to strict ethical and regulatory standards in data sharing.

All breast cancer datasets available through the portal are de-identified and limited to somatic mutations only.
Germline data is not analyzed or shared. This restriction is a result of ethical approvals that ensure patient privacy and compliance with responsible data-use guidelines.
Data is generated and processed using standardized pipelines designed to ensure quality and compliance.

The portal provides secondary-level processed datasets, including:

Somatic variant data
Gene expression matrices
Proteomics summaries
Associated de-identified clinical metadata

The portal visualizes highly processed, curated, and harmonized multidimensional cancer genomics data.

Study Title: ICGA Breast Cancer Cohort

The ICGA Breast Cancer Cohort represents a foundational dataset within the ICGA program.

Tumor and matched adjacent non-tumor control samples
Treatment-naïve patients, age range 18–90 years
Multi-omics profiling including genome sequencing, RNA sequencing and proteomics
Clinically annotated datasets collected across multiple centers in India

Metadata of the ICGA’s cohort on breast cancer patients

Data Access

The ICGA follows a controlled access model governed by the Data Access Committee (DAC) in alignment with ICGA Data Policy and DBT PRIDE guidelines.

The ICGA Foundation is committed to aligning its data governance framework with the Digital Personal Data Protection (DPDP) Act 2023 and the Digital Personal Data Protection Rules 2025, notified by the Ministry of Electronics and Information Technology (MeitY) in November 2025. The Rules provide for a phased implementation, with full substantive compliance required by May 2027. ICGA’s governance policies and data access framework are being updated accordingly during this period. Researchers with queries about data governance or compliance may write to suveera@icga.co.in.

How to Access ICGA Data

ICGA data is accessible through two routes. Both routes require a completed application and approval by the DAC before access is granted.

Route A — ICGA Data Portal (cBioPortal)

Interactive, browser-based access to processed and visualised datasets within the secure ICGA portal environment. Please remember no raw data would be available here. Researchers can explore somatic mutation profiles, gene expression patterns, proteomics summaries, and associated clinical metadata using the portal’s built-in analysis tools. Data remains within ICGA’s secure infrastructure at all times. [The portal visualizes highly processed, curated, and harmonized multidimensional cancer genomics data. ]

Route B — AWS Controlled Access

Programmatic access to processed data files — VCF files, MAF files, expression matrices, and processed proteomics outputs — for researchers requiring computational analysis beyond what the portal interface supports. Access is provided within ICGA-managed, India-based infrastructure. Applicants are responsible for all associated AWS infrastructure costs. Contact ICGA for details.

Both routes are subject to a single consolidated application reviewed by the DAC.

Note: Commercial and industry applications are subject to additional review including execution of a Commercial Data Licensing Agreement. Contact suveera@icga.co.in before submitting any data request.

Application Process (Single-Step)

Researchers seeking access must submit a single consolidated application that includes:

Research proposal and objectives
Details of investigators and institutional affiliation
Data requirements (type, scope, and access route requested)
Data management and security plan
Timelines and expected outcomes
Institutional ethics approvals
Conflict of interest disclosures
Details of any international or inter-institutional collaborations

All applications are reviewed by the ICGA Data Access Committee (DAC). Incomplete applications will not be considered. Full and final approval is followed by the signing of a Data User Agreement (DUA) with ICGA.

Processed Data Access (Conditional)

In cases where there is clear scientific justification and upon DAC approval, researchers may be granted access to processed data files within an ICGA-approved secure compute environment. Data does not leave ICGA’s governed infrastructure unless the DAC exceptionally approves it. The modality of access will be determined by ICGA on a case-by-case basis following DAC’s review.

Such requests must demonstrate:

A scientific need that cannot be met through portal-based analysis
Adequate institutional data security measures, including encryption at rest and in transit
Confirmation that ICGA dataset and its derivatives will be stored and processed in India-based infrastructure
Compliance with ICGA’s Data Policy and Data User Agreement

Eligible file types include processed somatic variant data (VCF/MAF), normalised expression matrices, and proteomics outputs.

Data Not Available

The following are not available for external access at this stage:

Raw sequencing data (BAM, FASTQ)
Raw proteomics data
Germline variant data
Any data that may increase re-identification risk

Requests for the above will not be considered under the current policy framework.

Conditions of Use

All approved users must comply with the following conditions throughout the approved access period:

Data must be used strictly for the approved research purpose. Any change in research purpose, methodology, or key personnel must be notified to ICGA within 30 days and may require a new application.
No attempt to re-identify any individual from ICGA de-identified datasets is permitted. This prohibition applies to all users and all methods of analysis.
Data must not be shared with parties not named in the approved application and Data User Agreement.
All ICGA data must be stored, accessed, and processed exclusively on India-based infrastructure. Transfer to servers or compute environments outside India is not permitted without explicit written approval from ICGA DAC.
Significant derived datasets generated from ICGA data — such as integrated multi-omics outputs or novel variant call sets — should be shared back with ICGA to enable cumulative scientific value for the community.
ICGA must be acknowledged in all oral presentations, written disclosures, and publications resulting from analyses of ICGA data.

Suggested citation:“The results [published or shown] here are based, in whole or in part, on data generated by the Indian Cancer Genome Atlas (ICGA) Network: https://icga.in, https://icga.net.in“

ICGA Data, Resources, and Materials

ICGA is dedicated to advancing cancer research through a rigorous, end-to-end process that involves:

Collecting diverse biospecimens and clinical metadata from partnering hospitals and clinical centres across India
Generating molecular analytes for detailed multi-omics characterisation
Applying standardised sequencing, proteomic, and imaging methods
Curating and annotating data to enable responsible, reproducible research
Providing accessible data to the research community through a governed access framework

Requests for Biological Samples and Materials

Due to legal and ethical considerations, ICGA is unable to accommodate requests for biological samples, analytes, or tissue materials. All cases within the ICGA programme have been consented exclusively for ICGA use, and the redistribution of materials to outside parties is prohibited. Additionally, the majority of tissue samples have been depleted through the multiple assays performed for ICGA research.

Apply for access to ICGA dataset

ICGA Data Portal

Frequently Asked Questions (FAQ)

What is the ICGA Portal?

The ICGA Portal is a customized deployment of the open-source cBioPortal platform that hosts curated, harmonized cancer genomics datasets generated through ICGA studies.

It enables interactive exploration, visualization, and download of clinical and genomic data.

What types of data are available on the portal?

The portal provides access to processed, secondary-level data, including:

Clinical metadata and clinical annotations
Sample-level phenotype data
Somatic mutation data
Copy Number Alteration (CNA) data
Gene-level summarized molecular profiles

Most visualizations allow download of the underlying data.

Can I download data from the portal?

Yes, provided your specific access request was approved with download. Data can be downloaded from:

Study pages
Query results
Visualization panels

Users can also define custom cohorts (“virtual studies”) using clinical or genomic filters and download the corresponding datasets.

Does the portal provide raw sequencing data?

No. The portal does not host raw sequencing data or raw count-level datasets.

It is designed for access to curated and processed data only.

How can I access raw RNA-seq data or VCF files?

Access to controlled datasets requires application through the ICGA Data Access Committee (DAC).

This includes:

Raw RNA-seq data (including count-level data)
Variant Call Format (VCF) files
Other detailed datasets not available through the portal

ICGA Data Access Form

For queries, contact:

suveera@icga.co.in
data-access@icga.co.in

Can I perform differential expression analysis using portal data?

No. Since the portal provides processed, gene-level summarized data rather than raw count matrices, workflows requiring raw count reprocessing are not supported directly from portal downloads.

How do I download Copy Number Alteration (CNA) data?

The portal supports:

Gene-level CNA downloads through the query interface
Segment-level copy number downloads through study-specific links

Users can query specific genes or define cohorts before exporting results.

“The results published or shown here are based, in whole or in part, on data generated by the Indian Cancer Genome Atlas (ICGA) Network: https://icga.in and https://icga.net.in.”

Where applicable, users should also cite associated ICGA publications relevant to the dataset used.

For citation-related queries, contact:

suveera@icga.co.in
data-access@icga.co.in

Are ICGA datasets versioned or updated?

Yes. ICGA datasets are periodically updated as new data is generated, processed, and curated.

Breast cancer study data, for example, is routinely updated.

Users should record:

Study identifier
Date of data access

for reproducibility and future reference.

For dataset version queries, contact:

suveera@icga.co.in
data-access@icga.co.in