Integration of EGA secure data access into Galaxy

@article{Hoogstrate2016IntegrationOE,
  title={Integration of EGA secure data access into Galaxy},
  author={Youri Hoogstrate and Chao Zhang and Alexander Senf and Jochem Bijlard and Saskia D. Hiltemann and David van Enckevort and Susanna Repo and Jaap Heringa and Guido W. Jenster and Remond J A Fijneman and Jan-Willem Boiten and Gerrit A. Meijer and Andrew P. Stubbs and Jordi Rambla and Dylan Spalding and Sanne Abeln},
  journal={F1000Research},
  year={2016},
  volume={5}
}
High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant… 

Figures from this paper

Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
TLDR
The conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file.
An overview of biomedical platforms for managing research data
TLDR
Managing biomedical Big Data will require the development of strategies that can efficiently leverage public cloud computing resources and the use of the research community developed standards for data collection can foster theDevelopment of machine learning methods for data processing and analysis.
Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine
TLDR
The findings demonstrate that citizen science-based approaches like PGP-UK have an important role to play in the public awareness, acceptance and implementation of genomics and personalized medicine.
The potential use of big data in oncology.
Detection of fusion transcripts and their genomic breakpoints from RNA sequencing data
TLDR
Novel, graph-based, Dr. Disco algorithm that makes use of both intronic and exonic RNA-seq reads to identify not only fusion transcripts but also genomic breakpoints in gene but also in intergenic regions is presented.
Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data
TLDR
By using the full potential of non–poly(A)-enriched RNA-seq data, sophisticated analysis can reliably identify expressed genomic breakpoints and their transcriptional effects.
Optimizing computational resource management for the scientific gateways ecosystems based on the service‐oriented paradigm
TLDR
The Modular Distributed Architecture to support the Protein Structure Prediction (MDAPSP), a Service‐Oriented Architecture for management and construction of Science Gateways, with resource provisioning on a heterogeneous environment is presented.
Personal Genome Project UK (PGP-UK): a research and citizen science hybrid project in support of personalized medicine
TLDR
The findings demonstrate that citizen science-based approaches like PGP-UK have an important role to play in the public awareness, acceptance and implementation of genomics and personalized medicine.
Optimising Scientific Workflow Execution Using Desktops, Clusters and Clouds
TLDR
The studies show that cloud machines may not be the best solution to every situation and that the advantages of heterogeneous cluster machines should be considered in scheduling experiments, saving both financial and computational resources, avoiding network delays and managing the infrastructure as needed.

References

SHOWING 1-10 OF 18 REFERENCES
Using Galaxy to Perform Large‐Scale Interactive Data Analyses
TLDR
The authors believe that Galaxy provides a powerful solution that simplifies data acquisition and analysis in an intuitive Web application, granting all researchers access to key informatics tools previously only available to computational specialists working in Unix‐based environments.
Using Galaxy to Perform Large‐Scale Interactive Data Analyses
TLDR
Galaxy amplifies the strengths of existing resources (such as UCSC Genome Browser) by allowing the user to access and, most importantly, analyze data within a single interface in an unprecedented number of ways.
tranSMART: An Open Source Knowledge Management and High Content Data Analytics Platform
  • E. ScheufeleD. Aronzon M. Palchuk
  • Computer Science
    AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
  • 2014
TLDR
TranSMART’s extensible data model and corresponding data integration processes, rapid data analysis features, and open source nature make it an indispensable tool in translational or clinical research.
myExperiment: a repository and social network for the sharing of bioinformatics workflows
TLDR
MyExperiment is an online research environment that supports the social sharing of bioinformatics workflows consisting of a series of computational tasks using web services, which may be performed on data from its retrieval, integration and analysis, to the visualization of the results.
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
TLDR
The FASTQ format is defined, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS.
STAR: ultrafast universal RNA-seq aligner
TLDR
The Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure outperforms other aligners by a factor of >50 in mapping speed.
Detection of TMPRSS2-ERG translocations in human prostate cancer by expression profiling using GeneChip Human Exon 1.0 ST arrays.
TLDR
It is demonstrated that expression analyses using exon arrays represent a valuable approach for detecting ETS gene translocation in prostate cancer, in parallel with analyses of genes whose expression levels significantly correlated with the presence of the translocation.
Role of the TMPRSS2-ERG gene fusion in prostate cancer.
TLDR
The results support previous work suggesting that TMPRSS2-ERG fusions mediate invasion, consistent with the defining histologic distinction between PIN and prostate cancer, and suggest that it may not be sufficient for transformation in the absence of secondary molecular lesions.
Dissemination of scientific software with Galaxy ToolShed
TLDR
The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web but revoked the freedom to easily select the most appropriate tools, so Galaxy ToolShed is developed.
Gene fusions by chromothripsis of chromosome 5q in the VCaP prostate cancer cell line
TLDR
The data indicate that although a marker of genomic instability, chromothripsis might lead to only a limited number of functionally relevant fusion genes.
...
...