Forays into open science by Hui Xin Ng
Published on Jan 24, 2024. DOI 10.21428/4f83582b.cc32bea4
Forays into open science
As a researcher primarily using computational tools for my work, I do my best to ensure that my code is publicly available and reproducible so that others who want to run the same analysis can retrace the steps I have taken and tweak them if needed.
In neuroimaging (one of the key methods in my research area), initiatives like the International Neuroimaging Data-sharing Initiative (INDI) and the Human Connectome Project (HCP) have facilitated open access to data. Additionally, the Brain Imaging Data Structure (BIDS) is an emerging standard for organizing and describing neuroimaging data in a consistent and structured way. BIDS encourages researchers to format and label their data in a standardized manner, making it easier to share, combine and cross-examine datasets across datasets. Large-scale collaborations are increasingly common; for instance, the Enhancing Neuro Imaging Genetics Through Meta-Analysis (ENIGMA) Consortium, which my lab is part of, has multiple working groups which focuses on various psychiatric disorders. One key goal of the consortium is to overcome issues associated with underpowered studies due to small sample sizes and to standardize data processing protocols.
However, my familiarity with open source pipelines and software has primarily revolved around being an end-user. I've downloaded Docker images and forked code repositories for my individual projects and shared my work. But the opportunity to work in a collaborative environment where version control is practiced within a team is a novel experience for me.
When the chance came up to apply bioinformatic tools to study a rare genetic disorder —thanks to the Bioinformatics Research Network and Sage Bionetworks — through Schwannomatosis Open Research Collaborative (SORC), I jumped on it.
Uncovering Schwannomatosis genetic insights with an open science approach
SORC is part of a larger umbrella project called Synodos for Schwannomatosis. Funded by the Children’s Tumor Foundation, its main goal is to facilitate collaborative research that advances our understanding of the disease and ultimately leads to the development of more effective treatments
Quick rundown of SORC and schwannomatosis:
Schwannomatosis is a rare genetic disorder leading to nerve sheath tumors, often caused by SMARCB1 and LZTR1 mutations. In many cases, however, the causes remain unknown, making treatment and prognosis difficult.
The primary goal of SORC is to conduct comprehensive genomic analyses, targeting noncoding variants, under-studied genes, and other genomic factors contributing to disease heterogeneity and etiology.
Using genetic data from whole exome sequencing (n=33; a technique to read the coding parts of their DNA) and whole genome sequencing (n=6; reading their entire DNA), sequence variants were identified by applying several variant calling pipelines (GATK HaplotypeCaller, DeepVariant, and Strelka)
After applying the allele frequency filter, we run the VCF files containing the remaining genetic variants through a pipeline containing splicing and missense variant annotation tools we have selected.
The variant annotation tools in the pipeline are open source, and they assess how pathogenic or impactful a variant might be — subsequently, we scale and average the scores to create a composite score. I primarily focused on identifying missense annotation tools and preparing the VCF files for proper integration into the pipeline.
Ultimately, the goal is to produce a list variants that could change gene expression, and may support the discovery of new genes or pathways that could be targeted for treatment.
In the future, proteomic and epigenetic data (e.g., DNA methylation) could be integrated to create a more complete picture of the disease.
A culture of collaboration between patient and research communities
SORC utilizes a platform called Synapse created by Sage Bionetworks for data hosting and management, which made the process of accessing data and metadata a far smoother process; it’s not uncommon for researchers to wait for weeks to get data or cloud access, when first starting a project, and I was very grateful to work in an environment with great data infrastructure right from the get go — it makes the process of doing science much more satisfying!
One of the most rewarding aspects of the project was learning how to use GitLab collaboratively — and practicing how to write effective documentation so that when members of team review the code, there would be sufficient context and explanation. GitLab also includes issue tracking and project management features that I personally found very helpful in organizing tasks, prioritizing issues, and tracking progress. One of the new concepts I encountered was the use of merge requests and issue tracking, which facilitates effective collaboration among team members when making code changes. Each issue was linked to a specific task, and code modifications for each corresponding task were systematically tracked. My code documentation can be found here:
https://gitlab.com/nghuixin/swnts-nghuixin.
Attending 2023 neurofibromatosis (NF) conference was a highlight because it was the first time I attended an academic conference in which patients, clinicians, drug developers and researchers were well-represented during the event. More on my reflections on the conference here.
Ways to learn more about the Schwannomatosis Open Research Collaborative (SORC)
Read the README on the github repo on how to contribute
Learn more about the Synodos Schwannomatosis that generated the data we used here
Read more about schwannomatosis and the data source of the project in Mansouri et al. 2020
Learn more about The Children’s Tumor Foundation and current NF research tools
Poster with team members co-authors Hector Kroes, Adon Chawe and project mentor Alexandra (“Sasha”) Scott and many others can be found here
Funding
The funding for this project and conference travel was provided by the Children’s Tumor Foundation.