Debian Med Project
Help us to see Debian used by medical practitioners and biomedical researchers! Join us on the Alioth page.
Summary
Next generation sequencing
Debian Med bioinformatics applications usable in Next Generation Sequencing

It aims at gettting packages which specializes in alignment of sequences produced by next generation sequencing.

The list to the right includes various software projects which are of some interest to the Debian Med Project. Currently, only a few of them are available as Debian packages. It is our goal, however, to include all software in Debian Med which can sensibly add to a high quality Debian Pure Blend.

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Med to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Med mailing list

Links to other tasks

Debian Med Next generation sequencing packages

Official Debian packages with high relevance

Bedtools
suite of utilities for comparing genomic features
Versions of package bedtools
ReleaseVersionArchitectures
wheezy2.16.1-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie2.17.0-1ia64,s390
sid2.17.0-1ia64,s390
jessie2.19.1-1sparc
sid2.19.1-1sparc
jessie2.20.1-1amd64,arm64,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid2.20.1-1amd64,arm64,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
Debtags of package bedtools:
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
scopesuite
useanalysing, comparing, converting, filtering
works-withbiological-sequence
Popcon: 35 users (32 upd.)*
Versions and Archs
License: DFSG free
Git

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by streaming several BEDTools together.

The groupBy utility is distributed in the filo package.

Please cite: Aaron R. Quinlan and Ira M. Hall: BEDTools: a flexible suite of utilities for comparing genomic features. (PubMed,eprint) Bioinformatics 26(6):841-842 (2010)
Bowtie
Ultraschnelles und speichersparendes Alignmentprogramm für kurze DNA-Sequenzen
Versions of package bowtie
ReleaseVersionArchitectures
wheezy0.12.7-3amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,powerpc,s390,s390x,sparc
jessie1.0.0-5s390
sid1.0.0-5ia64,s390
sid1.0.1-3amd64,kfreebsd-amd64
upstream1.1.0
Debtags of package bowtie:
biologynuceleic-acids
fieldbiology:bioinformatics
interfacecommandline
roleprogram
sciencecalculation
scopeutility
useanalysing, comparing
works-withbiological-sequence
Popcon: 18 users (7 upd.)*
Newer upstream!
License: DFSG free
Svn

Dieses Paket beschäftigt sich mit dem Problem die Resultate der neuesten (2010) DNA-Sequenziertechniken zu interpretieren. Diese Techniken ergeben ziemlich kurze Sequenzen, die nicht direkt interpretiert werden können. Es ist die Aufgabe für Werkzeuge wie Bowtie den chromosomalen Ort (Locus) der kurzen DNA-Sequenzen festzustellen.

Bowtie führt Alignments kurzer DNA-Sequenzen (Reads) mit dem menschlichen Genom durch, mit einer Rate von über 25 Millionen Reads (mit einer Länge von 35 Basenpaaren) pro Stunde. Bowtie indiziert das Genom mit einer Burrows-Wheeler-Transformation, um den Speicherverbrauch gering zu halten; normalerweise etwa 2,2 GB für das menschliche Genom (2,9 GB für »paired- end«).

The package is enhanced by the following packages: bowtie-examples
Please cite: Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. (eprint) Genome Biology 10:R25 (2009)
Bwa
Burrows-Wheeler Aligner
Versions of package bwa
ReleaseVersionArchitectures
squeeze0.5.8c-1amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy0.6.2-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie0.6.2-1s390
sid0.6.2-1hurd-i386,s390
jessie0.7.10-1amd64,kfreebsd-amd64
sid0.7.10-1amd64,kfreebsd-amd64
Debtags of package bwa:
biologynuceleic-acids, peptidic
fieldbiology, biology:bioinformatics
interfacecommandline, text-mode
roleprogram
useanalysing, comparing
Popcon: 22 users (58 upd.)*
Versions and Archs
License: DFSG free
Git

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.

Please cite: Heng Li and Richard Durbin: Fast and accurate short read alignment with Burrows-Wheeler transform. (PubMed,eprint) Bioinformatics 25(14):1754-1760 (2009)
Fastx-toolkit
Vorverarbeitung kurzer FASTQ/A-Nukleotidsequenzen
Versions of package fastx-toolkit
ReleaseVersionArchitectures
wheezy0.0.13.2-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie0.0.13.2-1s390
sid0.0.13.2-1s390
jessie0.0.14-1amd64,arm64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,ppc64el,s390x
sid0.0.14-1amd64,arm64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,ppc64el,s390x
Debtags of package fastx-toolkit:
roleprogram
Popcon: 16 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

FASTX-Toolkit ist eine Sammlung an Befehlszeilenwerkzeugen zur Vorverarbeitung von kurzen Nukleotidsequenzen im FASTA- oder FASTQ-Format, die üblicherweise von Sequenzierautomaten der nächsten Generation erstellt werden. Die Hauptverarbeitung solcher FASTA/FASTQ-Dateien ist das Alignieren der Sequenzen zu Referenzgenomen oder anderen Datenbanken, mittels spezialisierter Programme wie BWA, Bowtie und vielen anderen. Jedoch ist es manchmal produktiver die FASTA/FASTQ-Dateien zu vorverarbeiten, bevor die Sequenzen zum Genom angeordnet werden. Es werden also Sequenzen manipuliert, um bessere Resultate zu erhalten. Die Werkzeuge des FASTX-Toolkits führen einige dieser Vorverarbeitungen durch.

Filo
Datei- und Streamfunktionen
Versions of package filo
ReleaseVersionArchitectures
wheezy1.1+2011020401.2amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie1.1+2011020401.2s390
sid1.1+2011020401.2s390
jessie1.1+2011123001.1amd64,arm64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
sid1.1+2011123001.1amd64,arm64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Popcon: 14 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Die folgenden Werkzeuge sind als Teil des Pakets filo (FILe and stream Operations) verfügbar:

groupBy - imitiert die »groubBy«-Bedingung von Datenbanksystemen

shuffle - ordnet die Zeilen einer Datei zufällig an

stats - berechnet deskriptive Statistiken einer gegebenen Spalte von einer

        Tab-begrenzten Datei oder von einem Stream

Weil ihre Namen zu allgemein sind, sind »shuffle« und »stats« unter /usr/lib/filo zu finden.

Kissplice
Detection of various kinds of polymorphisms in RNA-seq data
Versions of package kissplice
ReleaseVersionArchitectures
jessie1.8.3-1ia64,s390
sid1.8.3-1s390
jessie2.1.0-1amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390x,sparc
sid2.1.0-1armel,armhf,hurd-i386,i386,ia64,kfreebsd-i386,mips,mipsel,powerpc,s390x,sparc
sid2.2.1-1amd64,arm64,kfreebsd-amd64,ppc64el
Debtags of package kissplice:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
useanalysing
works-withbiological-sequence
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Svn

KisSplice is a piece of software that enables the analysis of RNA-seq data with or without a reference genome. It is an exact local transcriptome assembler that allows one to identify SNPs, indels and alternative splicing events. It can deal with an arbitrary number of biological conditions, and will quantify each variant in each condition. It has been tested on Illumina datasets of up to 1G reads. Its memory consumption is around 5Gb for 100M reads.

Please cite: Gustavo AT Sacomoto, Janice Kielbassa, Rayan Chikhi, Raluca Uricaru, Pavlos Antoniou, Marie-France Sagot, Pierre Peterlongo and Vincent Lacroix: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. (PubMed,eprint) BMC Bioinformatics 13((Suppl 6)):S5 (2012)
Last-align
Vergleich biologischer Sequenzen für Genome
Versions of package last-align
ReleaseVersionArchitectures
squeeze128-1amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy199-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie199-1s390
jessie359-1ia64
sid359-1s390
sid393-1ia64
jessie418-1sparc
jessie475-1amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid475-1amd64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Debtags of package last-align:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
roleprogram
Popcon: 15 users (11 upd.)*
Versions and Archs
License: DFSG free
Svn

LAST ist eine Software für den Vergleich und die Alignierung von Sequenzen, in der Regel DNA- oder Protein-Sequenzen. LAST ähnelt BLAST, kommt jedoch besser mit sehr großen Mengen an Sequenzdaten zurecht. Hier sind zwei Dinge, die LAST gut beherrscht:

  • Vergleich großer Genome (z.B. von Säugetieren).
  • Kartierung viele Sequenz-Markierungen auf einem Genom.

Die wichtigste technische Neuerung ist, dass LAST erste Übereinstimmungen auf der Grundlage ihrer Vielfachheit findet, anstatt eine feste Größe zu verwenden (z.B. verwendet BLAST 10-mere). Dies ermöglicht es, Markierungen (Tags) ohne wiederholte Maskierung (repeat-masking) auf Genome anzuwenden, ohne durch wiederholte Treffer überschüttet zu werden. Um diese Übereinstimmungen variabler Größe zu finden, verwendet es ein von Vmatch inspiriertes Suffix-Array. Um eine hohe Empfindlichkeit zu erreichen, verwendet LAST ein nicht zusammenhängendes Suffix-Array, analog zu »spaced seeds«.

Please cite: Martin C. Frith, Raymond Wan and Paul Horton: Incorporating sequence quality data into alignment improves DNA read mapping. (PubMed,eprint) Nucl. Acids Res. 38(7):e100 (2010)
Maq
Kartiert kurze polymorphe DNA-Sequenzen mit fester Länge auf Referenzsequenzen
Versions of package maq
ReleaseVersionArchitectures
squeeze0.7.1-3amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy0.7.1-5amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie0.7.1-5amd64,arm64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390,s390x,sparc
sid0.7.1-5amd64,arm64,armel,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390,s390x,sparc
Debtags of package maq:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
scopeutility
useanalysing, comparing, searching
works-with-formatplaintext
Popcon: 16 users (12 upd.)*
Versions and Archs
License: DFSG free
Svn

Maq (kurz für Mapping and Assembly with Quality) erstellt kartierende Anordnungen von kurzen DNA-Sequenzen, wie sie von Sequenziergeräten der nächsten Generation erzeugt werden. Das Programm wurde speziell für den Illumina-Solexa 1G Genetic Analyzer entwickelt und hat eine erste Implementierung für den Umgang mit mit ABI-SOLiD-Daten. Maq war zuvor bekannt als mapass2.

Die Entwicklung von Maq wurde im Jahr 2008 eingestellt. Seine Nachfolger sind BWA und SAMtools.

Please cite: Heng Li, Jue Ruan and Richard Durbin: Mapping short DNA sequencing reads and calling variants using mapping quality scores. (PubMed,eprint) Genome Research 18(11):1851-1858 (2008)
Mira-assembler
Whole Genome Shotgun and EST Sequence Assembler
Versions of package mira-assembler
ReleaseVersionArchitectures
wheezy3.4.0.1-3amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie3.9.18-1ia64,s390
sid3.9.18-1s390
jessie4.0-1sparc
sid4.0-1ia64
jessie4.0.2-1amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid4.0.2-1amd64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Debtags of package mira-assembler:
roleprogram
Popcon: 10 users (8 upd.)*
Versions and Archs
License: DFSG free
Svn

The mira genome fragment assembler is a specialised assembler for sequencing projects classified as 'hard' due to high number of similar repeats. For expressed sequence tags (ESTs) transcripts, miraEST is specialised on reconstructing pristine mRNA transcripts while detecting and classifying single nucleotide polymorphisms (SNP) occuring in different variations thereof.

The assembler is routinely used for such various tasks as mutation detection in different cell types, similarity analysis of transcripts between organisms, and pristine assembly of sequences from various sources for oligo design in clinical microarray experiments.

The package provides the following executables: Binaries provided:

  • mira: for assembly of genome sequences
  • miramem: estimating memory needed to assemble projects. Realised through link to mira.
  • convert_project: for converting project file types into other types
  • caf2fasta, caf2gbf, caf2text, caf2html, gbf2caf and gbf2fasta are some frequently used file converters (realised through links to convert_project)
  • scftool: set of tools useful when working with SCF trace files
  • fastatool: set of tools useful when working with FASTA trace files

Scripts provided:

  • fasta2frag: fragmenting sequences into smaller, overlapping subsequences. Useful for simulating shotgun sequences. Can create subsequences in both directions (/default) and also paired-end sequences.
  • fastaselect: given a FASTA file (and possibly a FASTA quality file) and a file with names of reads, select the sequences from the input FASTA (and quality file) and writes them to an output FASTA
  • fastqselect: like fastaselect, only for FASTQ
  • fixACE4consed: Consed has a bug which incapacitates it from reading consensus tags in ACE files written by the MIRA assembler (and possibly other programs). This script massages an ACE file so that consed can read the consensus tags.
Please cite: Bastien Chevreux, Thomas Pfisterer, Bernd Drescher, Albert J. Driesel, Werner E. G. Müller, Thomas Wetter and Sándor Suhai: Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs. (PubMed,eprint) Genome Research 14(6):1147-1159 (2004)
Mothur
Sequenzanalysen-Suite zur Forschung an Mikrobiota
Versions of package mothur
ReleaseVersionArchitectures
wheezy1.24.1-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
sid1.24.1-1s390
jessie1.31.2+dfsg-2ia64
sid1.31.2+dfsg-2ia64
jessie1.33.3+dfsg-1sparc
jessie1.33.3+dfsg-2amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el
sid1.33.3+dfsg-2amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,sparc
Debtags of package mothur:
roleprogram
Popcon: 8 users (6 upd.)*
Versions and Archs
License: DFSG free
Svn

Mothur versucht eine quelloffene, erweiterbare Software zu entwickeln, die den bioinformatischen Anforderungen der Forscher der mikrobiellen Ökologie genügt. Es vereinigt die Funktionalität von DOTUR, SONS, TreeClimber, s-libshuff, UniFrac und vielen weiteren. Mothur hat nicht nur die Flexibilität dieser Algorithmen verbessert, sondern auch eine Anzahl an neuen Fähigkeiten hinzugefügt, etwa Berechnungs- und Visualisierungswerkzeuge.

Please cite: Patrick D Schloss, Sarah L Westcott, Thomas Ryabin, Justine R Hall, Martin Hartmann, Emily B Hollister, Ryan A Lesniewski, Brian B Oakley, Donovan H Parks, Courtney J Robinson, Jason W Sahl, Blaz Stres, Gerhard G Thallinger, David J Van Horn and Carolyn F Weber: Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. (PubMed) Appl Environ Microbiol 75(23):7537-7541 (2009)
Picard-tools
Command line tools to manipulate SAM and BAM files
Versions of package picard-tools
ReleaseVersionArchitectures
squeeze1.27-1all
wheezy1.46-1all
jessie1.95-1all
sid1.100-1all
jessie1.105-1all
sid1.105-1all
jessie1.110-1all
jessie1.113-1all
sid1.113-1all
Popcon: 17 users (35 upd.)*
Versions and Archs
License: DFSG free
Git

SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. Picard Tools includes these utilities to manipulate SAM and BAM files: BamToBfq IlluminaBasecallsToSam BuildBamIndex MarkDuplicates CalculateHsMetrics MeanQualityByCycle CleanSam MergeBamAlignment CollectAlignmentSummaryMetrics MergeSamFiles CollectGcBiasMetrics NormalizeFasta CollectInsertSizeMetrics QualityScoreDistribution CollectRnaSeqMetrics ReplaceSamHeader CompareSAMs RevertSam CreateSequenceDictionary SamFormatConverter ExtractIlluminaBarcodes SamToFastq EstimateLibraryComplexity SortSam FastqToSam ValidateSamFile FixMateInformation ViewSam

Qiime
Quantitative Insights Into Microbial Ecology
Versions of package qiime
ReleaseVersionArchitectures
wheezy1.4.0-2amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie1.4.0-2s390
sid1.7.0+dfsg-1s390
jessie1.8.0+dfsg-4amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid1.8.0+dfsg-4amd64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Debtags of package qiime:
roleprogram
Popcon: 8 users (2 upd.)*
Versions and Archs
License: DFSG free
Svn

QIIME (canonically pronounced ‘Chime’) is a pipeline for performing microbial community analysis that integrates many third party tools which have become standard in the field. A standard QIIME analysis begins with sequence data from one or more sequencing platforms, including

  • Sanger,
  • Roche/454, and
  • Illumina GAIIx. With all the underlying tools installed, of which not all are yet available in Debian (or any other Linux distribution), QIIME can perform

  • library de-multiplexing and quality filtering;

  • denoising with PyroNoise;
  • OTU and representative set picking with uclust, cdhit, mothur, BLAST, or other tools;
  • taxonomy assignment with BLAST or the RDP classifier;
  • sequence alignment with PyNAST, muscle, infernal, or other tools;
  • phylogeny reconstruction with FastTree, raxml, clearcut, or other tools;
  • alpha diversity and rarefaction, including visualization of results, using over 20 metrics including Phylogenetic Diversity, chao1, and observed species;
  • beta diversity and rarefaction, including visualization of results, using over 25 metrics including weighted and unweighted UniFrac, Euclidean distance, and Bray-Curtis;
  • summarization and visualization of taxonomic composition of samples using pie charts and histograms and many other features.

QIIME includes parallelization capabilities for many of the computationally intensive steps. By default, these are configured to utilize a mutli-core environment, and are easily configured to run in a cluster environment. QIIME is built in Python using the open-source PyCogent toolkit. It makes extensive use of unit tests, and is highly modular to facilitate custom analyses.

Please cite: J Gregory Caporaso, Justin Kuczynski, Stombaugh Jesse, Bittinger Kyle, Bushman Frederic D, Costello Elizabeth K, Fierer Noah, Pena Antonio Gonzalez, Goodrich Julia K, Gordon Jeffrey I, Huttley Gavin A, Kelley Scott T, Knights Dan, Koenig Jeremy E, Ley Ruth E, Lozupone Catherine A, McDonald Daniel, Muegge Brian D, Pirrung Meg, Reeder Jens, Sevinsky Joel R, Turnbaugh Peter J, Walters William A, Widmann Jeremy, Yatsunenko Tanya, Zaneveld Jesse and Knight Rob: QIIME allows analysis of high-throughput community sequencing data. (PubMed) Nature Methods 7:335 - 336 (2010)
R-bioc-edger
Empirische Analyse von digitalen Genexpressionsdaten mit R
Versions of package r-bioc-edger
ReleaseVersionArchitectures
wheezy2.6.1~dfsg-1all
jessie3.2.4~dfsg-1s390
sid3.2.4~dfsg-1s390
jessie3.4.2+dfsg-2ia64,sparc
sid3.4.2+dfsg-2ia64
jessie3.6.7+dfsg-1amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid3.6.7+dfsg-1mips
sid3.6.8+dfsg-1amd64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mipsel,powerpc,ppc64el,s390x,sparc
Popcon: 19 users (9 upd.)*
Versions and Archs
License: DFSG free
Git

Bioconductor-Paket zur Differentialexpressionsanalyse eines vollkommen sequenzierten Transkriptom (RNA-seq) und digitalen Genexpressionsprofilen mit biologischer Replikation. Es verwendet empirische Bayes-Methoden und exakte Tests, die auf der negativen Binomialverteilung basieren. Es ist auch für die Differentialsignalanalyse mit anderen Typen von Zähldaten in der Größenordnung von Genomen verwendbar.

Please cite: Mark D. Robinson and Gordon K. Smyth: Moderated statistical tests for assessing differences in tag abundance. (PubMed,eprint) Bioinformatics 23(21):2881-2887 (2007)
R-bioc-hilbertvis
GNU-R-Paket zur Visualisierung langer Datenvektoren
Versions of package r-bioc-hilbertvis
ReleaseVersionArchitectures
squeeze1.5.0-2amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy1.14.0-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie1.18.0-1ia64,s390,sparc
sid1.18.0-1ia64,s390
jessie1.22.0-1amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid1.22.0-1amd64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Debtags of package r-bioc-hilbertvis:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
useanalysing
Popcon: 19 users (22 upd.)*
Versions and Archs
License: DFSG free
Svn

Dieses Werkzeug ermöglicht die Anzeige sehr langer Datenvektoren auf platzsparende Weise, indem die Daten entlang einer 2D-Hilbert-Kurve angeordnet werden. Der Benutzer kann dann gleichzeitig visuell sowohl die großräumige Struktur und die Verteilung der Merkmale als auch die ungefähre Form und die Intensität der einzelnen Merkmale beurteilen.

In der Bioinformatik ist ein typischer Anwendungsfall ChIP-Chip und ChIP-Seq, oder grundsätzlich alle Arten von genomischen Daten, die konventionell als quantitative Spur (»wiggle-Daten«) von Genom-Browsern angezeigt werden, wie sie von Ensembl oder UCSC bereitgestellt werden.

Please cite: Simon Anders: Visualization of genomic data with the Hilbert curve. (PubMed,eprint) Bioinformatics 25(10):1231-1235 (2009)
Samtools
Verarbeitung von Sequenzalignments in den Formaten SAM und BAM
Versions of package samtools
ReleaseVersionArchitectures
squeeze0.1.8-1amd64,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390
wheezy0.1.18-1amd64,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390
jessie0.1.19-1amd64,arm64,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390,s390x
sid0.1.19-1amd64,arm64,armhf,hurd-i386,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390,s390x
experimental1.0-1~experimentalamd64
upstream1.0
Debtags of package samtools:
fieldbiology
interfacecommandline
networkclient
roleprogram
scopeutility
uitoolkitncurses
useanalysing, calculating, filtering
works-withbiological-sequence
Popcon: 70 users (26 upd.)*
Newer upstream!
License: DFSG free
Git

Der Werkzeugsatz Samtools verarbeitet Alignments von Nukleotidsequenzen im binären Format BAM. Er importiert aus und exportiert in das ASCII-Format SAM (Sequence Alignment/Map), kann sortieren, verbinden und indizieren. Zusätzlich können mit Samtools »Reads« in jeder Region schnell erfasst werden. Es ist auf ein Funktionieren via Stream erstellt worden und kann eine BAM-Datei (jedoch keine SAM-Datei) auf einem entfernten FTP- oder HTTP-Server öffnen.

The package is enhanced by the following packages: libbio-samtools-perl
Please cite: Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin and 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map (SAM) Format and SAMtools. (PubMed,eprint) Bioinformatics 25(16):2078-2079 (2009)
Sra-toolkit
utilities for the NCBI Sequence Read Archive
Versions of package sra-toolkit
ReleaseVersionArchitectures
wheezy2.1.7a-1amd64,i386,kfreebsd-amd64,kfreebsd-i386
jessie2.3.5-2+dfsg-1amd64,i386,kfreebsd-amd64,kfreebsd-i386
sid2.3.5-2+dfsg-1amd64,i386,kfreebsd-amd64,kfreebsd-i386
Popcon: 10 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Tools for reading the SRA archive, generally by converting individual runs into some commonly used format such as fastq.

The textual dumpers "sra-dump" and "vdb-dump" are provided in this release as an aid in visual inspection. It is likely that their actual output formatting will be changed in the near future to a stricter, more formalized representation[s]. PLEASE DO NOT RELY UPON THE OUTPUT FORMAT SEEN IN THIS RELEASE.

The "help" information will be improved in near future releases, and the tool options will become standardized across the set. More documentation will also be provided documentation on the NCBI web site.

Tool options may change in the next release. Version 1 tool options will remain supported wherever possible in order to preserve operation of any existing scripts.

Please cite: Rasko Leinonen, Ruth Akhtar, Ewan Birney, James Bonfield, Lawrence Bower, Matt Corbett, Ying Cheng, Fehmi Demiralp, Nadeem Faruque, Neil Goodgame, Richard Gibson, Gemma Hoad, Christopher Hunter, Mikyung Jang, Steven Leonard, Quan Lin, Rodrigo Lopez, Michael Maguire, Hamish McWilliam, Sheila Plaister, Rajesh Radhakrishnan, Siamak Sobhany, Guy Slater, Petra Ten Hoopen, Franck Valentin, Robert Vaughan, Vadim Zalunin, Daniel Zerbino and Guy Cochrane: Improvements to services at the European Nucleotide Archive. (PubMed,eprint) Nucleic Acids Research 38(Database issue):D39-45 (2010)
Ssake
Genomische Anwendung, die Millionen sehr kurzer DNA-Sequenzen zusammenführt
Versions of package ssake
ReleaseVersionArchitectures
squeeze3.5-1all
wheezy3.8-2all
jessie3.8-2all
sid3.8-2all
jessie3.8.1-1all
sid3.8.1-1all
jessie3.8.2-1all
sid3.8.2-1all
Debtags of package ssake:
biologynuceleic-acids
fieldbiology
interfaceshell
roleprogram
scopeutility
useanalysing
Popcon: 11 users (3 upd.)*
Versions and Archs
License: DFSG free
Svn

Short Sequence Assembly by K-mer search and 3′ read Extension (SSAKE) ist eine genomische Anwendung, die Millionen kurzer Nukleotidsequenzen auf aggressive Weise zusammenführt, indem stufenweise nach den perfekten 3'-endigen k-Meren - unter Verwendung eines DNA-Präfix-Baumes - gesucht wird. SSAKE wurde entwickelt, um die Informationen von Reads aus kurzen Sequenzen wirksam zu nutzen. Dies geschieht, indem diese durchgängig in Contigs angehäuft werden, die zur Charakterisierung von neu zu sequenzierenden Targets verwendet werden.

Please cite: Rene L. Warren, Granger G. Sutton, Steven J. M. Jones and Robert A. Holt: Assembling millions of short DNA sequences using SSAKE. (PubMed,eprint) Bioinformatics 23(4):500-501 (2007)
Tabix
generic indexer for TAB-delimited genome position files
Versions of package tabix
ReleaseVersionArchitectures
wheezy0.2.6-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie0.2.6-1s390
sid0.2.6-1s390
jessie0.2.6-2amd64,arm64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
sid0.2.6-2armel,armhf,hurd-i386,ia64,mips,powerpc,s390x,sparc
sid1.0-2amd64,arm64,i386,kfreebsd-amd64,kfreebsd-i386,mipsel,ppc64el
Debtags of package tabix:
roleprogram
works-with-formathtml
Popcon: 19 users (13 upd.)*
Versions and Archs
License: DFSG free
Git

Tabix indexes files where some columns indicate sequence coordinates: name (usually a chromosme), start and stop. The input data file must be position sorted and compressed by bgzip (provided in this package), which has a gzip like interface. After indexing, tabix is able to quickly retrieve data lines by chromosomal coordinates. Fast data retrieval also works over network if an URI is given as a file name.

This version of tabix is built from the HTSlib source.

Tophat
fast splice junction mapper for RNA-Seq reads
Versions of package tophat
ReleaseVersionArchitectures
jessie2.0.9-1s390
sid2.0.9-1s390
sid2.0.10-1hurd-i386,ia64
sid2.0.12+dfsg-2amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

TopHat aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland Center for Bioinformatics and Computational Biology and the University of California, Berkeley Departments of Mathematics and Molecular and Cell Biology.

The package is enhanced by the following packages: cufflinks
Please cite: Cole Trapnell, Lior Pachter and Steven L. Salzberg: TopHat: discovering splice junctions with RNA-Seq. (PubMed,eprint) Bioinformatics 25(9):1105-1111 (2009)
Uc-echo
error correction algorithm designed for short-reads from NGS
Versions of package uc-echo
ReleaseVersionArchitectures
jessie1.12-1s390
sid1.12-1s390
jessie1.12-3ia64
sid1.12-4ia64
jessie1.12-6sparc
jessie1.12-7amd64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid1.12-7amd64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Popcon: 1 users (3 upd.)*
Versions and Archs
License: DFSG free
Svn

ECHO is an error correction algorithm designed for short-reads from next-generation sequencing platforms such as Illumina's Genome Analyzer II. The algorithm uses a Bayesian framework to improve the quality of the reads in a given data set by employing maximum a posteriori estimation.

Vcftools
Collection of tools to work with VCF files
Versions of package vcftools
ReleaseVersionArchitectures
wheezy0.1.9-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie0.1.11+dfsg-1ia64,s390
sid0.1.11+dfsg-1ia64,s390
jessie0.1.12+dfsg-1amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
sid0.1.12+dfsg-1amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Debtags of package vcftools:
roleprogram
Popcon: 16 users (16 upd.)*
Versions and Archs
License: DFSG free
Svn

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics.

Please cite: Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean and Richard Durbin: The variant call format and VCFtools. (PubMed,eprint) Bioinformatics 27(15):2156-8 (2011)
Velvet
Assembler für Sequenzen von Nukleinsäuren von sehr kleinen Bruchstücken (»short reads«)
Versions of package velvet
ReleaseVersionArchitectures
squeeze1.0.02~nozlibcopy-1amd64,armel,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,sparc
wheezy1.2.03~nozlibcopy-1amd64,armel,armhf,i386,ia64,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,s390,s390x,sparc
jessie1.2.10+dfsg-1ia64,s390,sparc
sid1.2.10+dfsg-1ia64,s390
jessie1.2.10+dfsg1-1amd64,arm64,armel,armhf,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x
sid1.2.10+dfsg1-1amd64,arm64,armel,armhf,hurd-i386,i386,kfreebsd-amd64,kfreebsd-i386,mips,mipsel,powerpc,ppc64el,s390x,sparc
Debtags of package velvet:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
useanalysing
Popcon: 16 users (3 upd.)*
Versions and Archs
License: DFSG free
Svn

Velvet ist ein genomischer De-Novo-Assembler, der speziell für Short-Read-Sequenziertechnologien, wie Solexa oder 454, erstellt wurde. Entwickelt wurde der Assembler von Daniel Zerbino und Ewan Birney am European Bioinformatics Institute (EMBL-EBI), nahe Cambridge, im Vereinigten Königreich.

Derzeit liest Velvet Short-Read-Sequenzen ein, entfernt Fehler und erstellt hochqualitative einzigartige Contigs. Danach werden, falls vorhanden, Paired-Read-Informationen verwendet, um die repetitiven Bereiche zwischen Contigs zu erhalten.

Please cite: Daniel R. Zerbino and Ewan Birney: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. (PubMed,eprint) Genome Research 18(5):821-829 (2008)

Debian packages in contrib or non-free

Cufflinks
Transcript assembly, differential expression and regulation for RNA-Seq
Versions of package cufflinks
ReleaseVersionArchitectures
wheezy1.3.0-2 (non-free)amd64
jessie2.2.1-1 (non-free)amd64
sid2.2.1-1 (non-free)amd64
Popcon: 6 users (2 upd.)*
Versions and Archs
License: non-free
Git

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Please cite: Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. (PubMed) Nature Biotechnology 28(5):511-515 (2010)

Packaging has started and developers might try the packaging code in VCS

Mosaik-aligner
reference-guided aligner for next-generation sequencing
License: MIT
Debian package not available
Svn
Version: 1.1.0021-1

MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikText converts alignments to different text-based formats.

At this time, the workflow consists of supplying sequences in FASTA, FASTQ, Illumina Bustard & Gerald, or SRF file formats and producing results in the BLAT axt, the BAM/SAM, the UCSC Genome Browser bed, or the Illumina ELAND formats.

No known packages available

Annovar
annotate genetic variants detected from diverse genomes
License: Open Source for non-profit
Debian package not available

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others). Given a list of variants with chromosome, start position, end position, reference nucleotide and observed nucleotides, ANNOVAR can perform:

 1. Gene-based annotation: identify whether SNPs or CNVs cause protein coding
    changes and the amino acids that are affected. Users can flexibly use RefSeq
    genes, UCSC genes, ENSEMBL genes, GENCODE genes, or many other gene definition
    systems.
 2. Region-based annotations: identify variants in specific genomic regions,
    for example, conserved regions among 44 species, predicted transcription
    factor binding sites, segmental duplication regions, GWAS hits, database
    of genomic variants, DNAse I hypersensitivity sites, ENCODE
    H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks, RNA-Seq peaks, or many
    other annotations on genomic intervals.
 3. Filter-based annotation: identify variants that are reported in dbSNP,
    or identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project,
    or identify subset of non-synonymous SNPs with SIFT score>0.05, or many
    other annotations on specific mutations.
 4. Other functionalities: Retrieve the nucleotide sequence in any
    user-specific genomic positions in batch, identify a candidate gene list
    for Mendelian diseases from exome data, identify a list of SNPs from
    1000 Genomes that are in strong LD with a GWAS hit, and many other
    creative utilities.

In a modern desktop computer (3GHz Intel Xeon CPU, 8Gb memory), for 4.7 million variants, ANNOVAR requires ~4 minutes to perform gene-based functional annotation, or ~15 minutes to perform stepwise "variants reduction" procedure, making it practical to handle hundreds of human genomes in a day.

Forge
genome assembler for mixed read types
License: Apache 2.0
Debian package not available

Forge Genome Assembler is a parallel, MPI based genome assembler for mixed read types.

Forge is a classic "Overlap layout consensus" genome assembler written by Darren Platt and Dirk Evers. Implemented in C++ and using the parallel MPI library, it runs on one or more machines in a network and can scale to very large numbers of reads provided there is enough collective memory on the machines used. It generates a full consensus alignment of all reads, can handle mixtures of sanger, 454 and illumina reads. There is some support for solid color space and it includes built in tools for vector trimming and contamination screening.

Forge and was originally developed at Exelixis and they have kindly agreed to place the software which underwent much subsequent development outside Exelixis, into the public domain. Forge works with most of the common MPI implementations.

Remark of Debian Med team: Competitor to MIRA2 and wgs-assembler

This package was requested by William Spooner whs@eaglegenomics.com as a competitor to MIRA2 and wgs-assembler.

*Popularitycontest results: number of people who use this package regularly (number of people who upgraded this package recently) out of 167387