Search CORE

22 research outputs found

BioCPR–A Tool for Correlation Plots

Author: Fey Vidal
Heron Samuel
Jambulingam Dhanaprakash
Sara Henri
Schleutker Johanna
Sipeky Csilla
Publication venue: 'MDPI AG'
Publication date: 28/10/2022
Field of study

A gene is a sequence of DNA bases through which genetic information is passed on to the next generation. Most genes encode for proteins that ultimately control cellular function. Understanding the interrelation between genes without the application of statistical methods can be a daunting task. Correlation analysis is a powerful approach to determine the strength of association between two variables (e.g., gene-wise expression). Moreover, it becomes essential to visualize this data to establish patterns and derive insight. The most common method for gene expression visualization is to use correlation heatmaps in which the colors of the plot represent strength of co-expression. In order to address this requirement, we developed a visualization tool called BioCPR: Biological Correlation Plots in R. This tool performs both correlation analysis and subsequent visualization in the form of an interactive heatmap, improving both usability and interpretation of the data. BioCPR is an R Shiny-based application and can be run locally in Rstudio or a web browser.</p

UTUPub

BioCPR : a tool for correlation plots

Author: Fey Vidal
Heron Samuel
Jambulingam Dhanaprakash
Sara Henri
Schleutker Johanna
Sipeky Csilla
Publication venue: 'MDPI AG'
Publication date: 01/09/2021
Field of study

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University

Kuura-An automated workflow for analyzing WES and WGS data.

Author: Dhanaprakash Jambulingam
Johanna Schleutker
Samuel Heron
Venkat Subramaniam Rathinakannan
Vidal Fey
Publication venue: Public Library of Science (PLoS)
Publication date: 01/01/2024
Field of study

The advent of high-throughput sequencing technologies has revolutionized the field of genomic sciences by cutting down the cost and time associated with standard sequencing methods. This advancement has not only provided the research community with an abundance of data but has also presented the challenge of analyzing it. The paramount challenge in analyzing the copious amount of data is in using the optimal resources in terms of available tools. To address this research gap, we propose "Kuura-An automated workflow for analyzing WES and WGS data", which is optimized for both whole exome and whole genome sequencing data. This workflow is based on the nextflow pipeline scripting language and uses docker to manage and deploy the workflow. The workflow consists of four analysis stages-quality control, mapping to reference genome & quality score recalibration, variant calling & variant recalibration and variant consensus & annotation. An important feature of the DNA-seq workflow is that it uses the combination of multiple variant callers (GATK Haplotypecaller, DeepVariant, VarScan2, Freebayes and Strelka2), generating a list of high-confidence variants in a consensus call file. The workflow is flexible as it integrates the fragmented tools and can be easily extended by adding or updating tools or amending the parameters list. The use of a single parameters file enhances reproducibility of the results. The ease of deployment and usage of the workflow further increases computational reproducibility providing researchers with a standardized tool for the variant calling step in different projects. The source code, instructions for installation and use of the tool are publicly available at our github repository https://github.com/dhanaprakashj/kuura_pipeline

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University

BioCPR–A Tool for Correlation Plots

Author: Csilla Sipeky
Dhanaprakash Jambulingam
Henri Sara
Johanna Schleutker
Samuel Heron
Vidal Fey
Publication venue: MDPI AG
Publication date: 08/09/2021
Field of study

Crossref

Screenshot showing a successfully executed pipeline and the information presented while the pipeline is running.

Author: Dhanaprakash Jambulingam (17811226)
Johanna Schleutker (200707)
Samuel Heron (2174558)
Venkat Subramaniam Rathinakannan (9094829)
Vidal Fey (155117)
Publication venue
Publication date: 18/01/2024
Field of study

Screenshot showing a successfully executed pipeline and the information presented while the pipeline is running.</p

The Francis Crick Institute

Validation results using each variant caller.

Author: Dhanaprakash Jambulingam (17811226)
Johanna Schleutker (200707)
Samuel Heron (2174558)
Venkat Subramaniam Rathinakannan (9094829)
Vidal Fey (155117)
Publication venue
Publication date: 18/01/2024
Field of study

The table shows the number of variants identified by each variant caller, their precision and recall values. *The table contains only SNP information.</p

The Francis Crick Institute

Summary of the tools and their respective <i>docker</i> containers used in each stage.

Author: Dhanaprakash Jambulingam (17811226)
Johanna Schleutker (200707)
Samuel Heron (2174558)
Venkat Subramaniam Rathinakannan (9094829)
Vidal Fey (155117)
Publication venue
Publication date: 18/01/2024
Field of study

Summary of the tools and their respective docker containers used in each stage.</p

The Francis Crick Institute

Summary of the steps executed by the Kuura pipeline.

Author: Dhanaprakash Jambulingam (17811226)
Johanna Schleutker (200707)
Samuel Heron (2174558)
Venkat Subramaniam Rathinakannan (9094829)
Vidal Fey (155117)
Publication venue
Publication date: 18/01/2024
Field of study

Summary of the steps executed by the Kuura pipeline.</p

The Francis Crick Institute

Detailed installation and usage instructions.

Author: Dhanaprakash Jambulingam (17811226)
Johanna Schleutker (200707)
Samuel Heron (2174558)
Venkat Subramaniam Rathinakannan (9094829)
Vidal Fey (155117)
Publication venue
Publication date: 18/01/2024
Field of study

The advent of high-throughput sequencing technologies has revolutionized the field of genomic sciences by cutting down the cost and time associated with standard sequencing methods. This advancement has not only provided the research community with an abundance of data but has also presented the challenge of analyzing it. The paramount challenge in analyzing the copious amount of data is in using the optimal resources in terms of available tools. To address this research gap, we propose “Kuura—An automated workflow for analyzing WES and WGS data”, which is optimized for both whole exome and whole genome sequencing data. This workflow is based on the nextflow pipeline scripting language and uses docker to manage and deploy the workflow. The workflow consists of four analysis stages—quality control, mapping to reference genome & quality score recalibration, variant calling & variant recalibration and variant consensus & annotation. An important feature of the DNA-seq workflow is that it uses the combination of multiple variant callers (GATK Haplotypecaller, DeepVariant, VarScan2, Freebayes and Strelka2), generating a list of high-confidence variants in a consensus call file. The workflow is flexible as it integrates the fragmented tools and can be easily extended by adding or updating tools or amending the parameters list. The use of a single parameters file enhances reproducibility of the results. The ease of deployment and usage of the workflow further increases computational reproducibility providing researchers with a standardized tool for the variant calling step in different projects. The source code, instructions for installation and use of the tool are publicly available at our github repository https://github.com/dhanaprakashj/kuura_pipeline.</div

The Francis Crick Institute

Complete validation results.

Author: Dhanaprakash Jambulingam (17811226)
Johanna Schleutker (200707)
Samuel Heron (2174558)
Venkat Subramaniam Rathinakannan (9094829)
Vidal Fey (155117)
Publication venue
Publication date: 18/01/2024
Field of study

In the revision process, the pipeline was validated on gold standard data sets HG003, HG004, HG006 and HG007, data sets generated with the same sequencing protocol in the same study as data sets HG001, HG002 and HG005. The table shows the number of variants identified by each variant caller, their precision and recall values. *The table contains only SNP information. (XLSX)</p

The Francis Crick Institute