-
Notifications
You must be signed in to change notification settings - Fork 396
Open
Labels
enhancementImprovement of existing code or methodImprovement of existing code or methodnew featureNew method or data structureNew method or data structure
Description
Variation APIs that bring together Ensembl Variation, VCF file format, GFF3+GVF file format, samtools, Picard, GATK, etc.
Several similar file specifications exist for dealing with sequence variation, including:
- VCF (Variant Call Format) is a text file format used by the 1000 Genomes project and others for representing variation against a reference sequence.
- The [Genome Variation Format](http://www.sequenceontology.org/resources/gvf_1.02.html http://www.sequenceontology.org/resources/gff3.html) (GVF) is a text file format for describing sequence variants at nucleotide resolution relative to a reference genome. GVF is a type of GFF3 file with additional pragmas and attributes specified.
*samtools
*picard - GATK
Some support for these file specifications is already present in various bioinformatics libraries (and in fact biojava3 already provides GFF3 support); it would be desirable to pull these together behind a set of common APIs in biojava3.
Approach
- Consider existing open source VCF and GVF implementations (Genotype Analysis Toolkit, GATK, VCFTools, Picard, GVF-Parser, etc.)
- Design APIs for common entities (Allele, Genotype, Haplotype, etc.)
- Create adaptors to third party implementations or implement support directly in Biojava3
Suggested for GSoC 2013
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementImprovement of existing code or methodImprovement of existing code or methodnew featureNew method or data structureNew method or data structure