Debian Med Project
Help us to see Debian used by medical practitioners and biomedical researchers! Join us on the Salsa page.
Summary
Next Generation Sequencing
Debian Med bioinformatics applications usable in Next Generation Sequencing

It aims at gettting packages which specialize in the processing or interpretation of data generated with next- (and later-) generation high-thoughput sequencing technologies.

Description

For a better overview of the project's availability as a Debian package, each head row has a color code according to this scheme:

If you discover a project which looks like a good candidate for Debian Med to you, or if you have prepared an unofficial Debian package, please do not hesitate to send a description of that project to the Debian Med mailing list

Links to other tasks

Debian Med Next Generation Sequencing packages

Official Debian packages with high relevance

anfo
aligneur/mappeur de lectures courtes à partir de MPG
Versions of package anfo
ReleaseVersionArchitectures
bullseye0.98-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.98-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
jessie0.98-4amd64,armel,armhf,i386
stretch0.98-5amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0.98-7amd64,arm64,armhf,i386
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Anfo est un mappeur dans l'esprit de Soap/Maq/Bowtie, mais sa mise en œuvre relève plus de BLAST/BLAT. Il est plus utile pour l'alignement de séquences de lectures où la séquence ADN est quelque peu modifiée (pensez à de l'ADN ancien ou au traitement au bisulfite) et/ou il existe plus de divergence entre l'échantillon et la référence, que ces mappeurs rapides géreront avec élégance (considérant que le génome de référence est manquant, une variante proche est utilisée en lieu et place).

Registry entries: SciCrunch 
Topics: Sequencing
arden
contrôle de spécificité pour les alignements de lecture utilisant une référence artificielle
Versions of package arden
ReleaseVersionArchitectures
sid1.0-5all
stretch1.0-3all
buster1.0-4all
bullseye1.0-5all
jessie1.0-1amd64,armel,armhf,i386
bookworm1.0-5all
trixie1.0-5all
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

ARDEN (estimation de faux positifs dans des données NGS déterminée par référence artificielle) est un étalonnage novateur qui estime les taux d'erreur basés sur des lectures expérimentales réelles et un génome de référence artificiel généré en supplément. Il permet le calcul de taux d'erreur spécifiquement pour un ensemble de données et la construction d'une courbe ROC. Toutefois, il peut être utilisé pour optimiser des paramètres pour des mappeurs de lecture, pour sélectionner des mappeurs de lecture pour un problème spécifique mais aussi pour filtrer des alignements basés sur une estimation de qualité.

Please cite: Sven H. Giese, Franziska Zickmann and Bernhard Y. Renard: Specificity control for read alignments using an artificial reference genome-guided false discovery rate. (PubMed,eprint) Bioinformatics 30(1):9-16 (2013)
Registry entries: SciCrunch 
Topics: Sequencing
art-nextgen-simulation-tools
outils de simulation pour créer des lectures de séquençage synthétiques de nouvelle génération
Versions of package art-nextgen-simulation-tools
ReleaseVersionArchitectures
bullseye20160605+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster20160605+dfsg-3amd64,arm64,armhf,i386
stretch20160605+dfsg-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid20160605+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie20160605+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm20160605+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

ART est un ensemble d'outils de simulation pour créer des lectures de séquençage synthétiques de nouvelle génération. ART simule les lectures de séquençages en imitant le vrai processus de séquençage avec des modèles d'erreur empiriques ou des profils de qualité établis à partir de grandes données de séquençage recalibrées. ART peut également simuler les lectures grâce à un modèle d'erreur de lecture ou des profils de qualité fournis par l'utilisateur. ART prend en charge la simulation de lecture « single end », « paired-end » ou « mate-pair » des trois principales plateformes commerciales de séquençage de nouvelle génération : Solexa d'Illumina, 454 de Roche et SOLiD d'Applied Biosystems. Art peut servir à tester ou évaluer diverses méthodes ou outils pour l'analyse de données de séquençage de nouvelle génération, dont l'alignement de lectures, l'assemblage de novo et la découverte de variations SNP et de structures. ART a servi en tant qu'outil principal pour l'étude de simulation du projet « 1000 Genomes ». ART est implémenté en C++ avec des algorithmes optimisés et est très efficace dans la simulation de lecture. ART produit des lectures au format FASTQ et des alignements au format ALN. ART peut aussi créer des alignements au format d'alignement SAM ou au format UCSC BED. ART peut être utilisé conjointement avec des simulateurs de variantes de génome comme VarSim pour évaluer les outils ou méthodes d'appel de variante.

Please cite: Weichun Huang, Leping Li, Jason R. Myers and Gabor T. Marth: ART: a next-generation sequencing read simulator. (PubMed,eprint) Bioinformatics 28(4):593-594 (2012)
Registry entries: SciCrunch  Bioconda 
artfastqgenerator
création de fichiers FASTQ artificiels dérivés d'un génome de référence
Versions of package artfastqgenerator
ReleaseVersionArchitectures
stretch0.0.20150519-2all
buster0.0.20150519-3all
bullseye0.0.20150519-4all
bookworm0.0.20150519-4all
trixie0.0.20150519-5all
sid0.0.20150519-5all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ArtificialFastqGenerator prend en entrée un génome de référence au format FASTA et renvoie des fichiers FASTQ artificiels au format Sanger. Il accepte les scores de qualité Phred de bases depuis des fichiers FASTQ existants et s'en sert pour simuler des erreurs de séquençage. Puisque les fichiers FASTQ artificiels sont dérivés du génome de référence, celui-ci fournit une référence pour appeler les variantes (« Single Nucleotide Polymorphisms » (SNP) et insertions et suppressions (indels)). Cela permet d'évaluer une infrastructure d'analyse de séquençage de nouvelle génération (NGS) qui aligne les lectures avec le génome de référence puis appelle les variantes.

Please cite: Matthew Frampton and Richard Houlston: Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines. (PubMed,eprint) PLOSone 7(11):e49110 (2012)
bamtools
boîte à outils de manipulation de fichiers BAM (alignement de génomes)
Versions of package bamtools
ReleaseVersionArchitectures
jessie2.3.0+dfsg-2amd64,armel,armhf,i386
sid2.5.2+dfsg-6amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie2.5.2+dfsg-6amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm2.5.2+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.5.1+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.5.1+dfsg-3amd64,arm64,armhf,i386
stretch2.4.1+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

BamTools facilite l'analyse de recherche et la gestion de données grâce aux fichiers BAM. Il surmonte les énormes quantités de données produites par les technologies de séquençage actuelles qui sont généralement stockées dans des formats binaires compressés difficiles à gérer avec les analyseurs de textes couramment utilisés dans la recherche en bioinformatique.

BamTools fournit une API C++ pour la prise en charge des fichiers BAM ainsi qu'une boîte à outils en ligne de commande.

Il s'agit de la boîte à outils en ligne de commande bamtools.

Les commandes bamtools disponibles sont :

 − convert : conversion entre BAM et beaucoup d'autres formats ;
 − count : affichage du nombre d'alignements dans un ou plusieurs fichiers
   BAM ;
 − coverage :affichage des statistiques de couverture à partir d'un
   fichier d'entrée BAM ;
 − filter : filtrage des fichiers BAM selon des critères indiqués par
   l'utilisateur ;
 − header : affichage des informations d'en-tête BAM ;
 − index : création d'index pour un fichier BAM ;
 − merge : fusion de plusieurs fichiers BAM en un seul ;
 − random : sélection aléatoire d'alignements depuis des fichiers BAM
   existants à des fins de test ;
 − resolve : résolution de lectures de fins en paires (affichant le
   drapeau IsProperPair requis) ;
 − revert : suppression des marques dupliquées et restauration des
   qualités de base d'origine ;
 − sort : tri d'un fichier BAM en suivant certains critères ;
 − split : séparation d'un fichier BAM en fonction d'une propriété définie
   par l'utilisateur, créant un nouveau fichier BAM en sortie pour chaque
   valeur trouvée ;
 − stats : affichage de statistiques de base à partir d'un ou plusieurs
   fichiers BAM en entrée.
The package is enhanced by the following packages: multiqc
Please cite: Derek W. Barnett, Erik K. Garrison, Aaron R. Quinlan, Michael P. Stromberg and Gabor T. Marth: BamTools: a C++ API and toolkit for analyzing and managing BAM files. (PubMed,eprint) Bioinformatics 27(12):1691-2 (2011)
Registry entries: Bio.tools  SciCrunch  Bioconda 
bcftools
appel de variantes génomiques et manipulation de fichiers VCF et BCF
Versions of package bcftools
ReleaseVersionArchitectures
bullseye1.11-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
stretch-backports1.8-1~bpo9+1amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el
stretch1.3.1-1amd64,arm64,armel,mips64el,mipsel,ppc64el
sid1.20-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.20-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.16-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
buster1.9-1amd64,arm64,armhf
upstream1.21
Popcon: 30 users (9 upd.)*
Newer upstream!
License: DFSG free
Git

BCFtools est un ensemble d'outils qui manipulent des appels de variation dans le format « Variant Call Format » (VCF) et son équivalent binaire BCF. Toutes les commandes fonctionnent de façon transparente avec VCF et BCF, qu'ils soient compressés avec BGZF ou non.

The package is enhanced by the following packages: multiqc
Please cite: Petr Danecek and Shane A. McCarthy: BCFtools/csq: Haplotype-aware variant consequences. (2016)
Registry entries: Bio.tools  SciCrunch  Bioconda 
bedtools
suite d'utilitaires pour la comparaison des caractéristiques de génomes
Versions of package bedtools
ReleaseVersionArchitectures
jessie2.21.0-1amd64,armhf,i386
trixie2.31.1+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm2.30.0+dfsg-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.30.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.27.1+dfsg-4amd64,arm64,armhf
stretch2.26.0+dfsg-3amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
sid2.31.1+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Debtags of package bedtools:
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
scopesuite
useanalysing, comparing, converting, filtering
works-withbiological-sequence
Popcon: 40 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

Les utilitaires BEDTools permettent de traiter les tâches génomiques ordinaires telles que trouver les chevauchements de caractéristiques et informatiser les données. Les utilitaires sont en grande partie basés sur quatre formats de fichier très utilisés : BED, GFF/GTF, VCF et SAM/BAM. En utilisant BEDTools, il est possible de développer des enchainements sophistiqués qui répondent à des questions de recherche en utilisant plusieurs outils séquentiellement.

L'outil groupBy est fourni par le paquet filo.

Please cite: Aaron R. Quinlan and Ira M. Hall: BEDTools: a flexible suite of utilities for comparing genomic features. (PubMed,eprint) Bioinformatics 26(6):841-842 (2010)
Registry entries: Bio.tools  SciCrunch  Bioconda 
berkeley-express
quantification de streaming pour le séquençage à taux élevé
Versions of package berkeley-express
ReleaseVersionArchitectures
bullseye1.5.3+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.5.1-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.5.3+dfsg-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.5.3+dfsg-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster1.5.2+dfsg-1amd64,arm64,armhf,i386
bookworm1.5.3+dfsg-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

eXpress est un outil de streaming pour quantifier les abondances d'un ensemble de séquences cibles depuis des sous séquences échantillonnées. Des applications peuvent être la quantification de séquences d'ARN au niveau de la transcription, l'analyse d'expressions spécifiques à des allèles ou d'haplotypes, la quantification de lien de facteur de transcription dans le séquençage ChIP et l'analyse de données métagénomiques. Il est basé sur un algorithme EM en ligne dont les besoins en mémoire sont proportionnels à la taille totale des séquences cibles et les besoins en temps qui sont proportionnels au nombre de fragments échantillonnés. Ainsi, dans les applications comme le séquençage d'ARN, eXpress peut quantifier avec précision des échantillons bien plus grands que les autres outils actuellement disponibles, ce qui réduit grandement les besoins en infrastructure de calcul. eXpress peut être utilisé pour construire des infrastructures légères à haut débit de traitement de séquençage quand il est couplé à un aligneur de streaming comme Bowtie. En effet, la sortie peut être transférée directement dans eXpress, ce qui permet de ne pas avoir à stocker en mémoire ou sur le disque les alignements lus.

Une analyse des performances d'eXpress sur des données de séquençage d'ARN a montré que son efficacité ne se fait pas aux dépens de la précision. eXpress est plus précis que d'autres outils disponibles même avec des jeux de données de petite taille qui ne nécessitent pas une telle efficacité. De plus, comme le programme Cufflinks, eXpress peut servir à estimer les abondances de transcription de gènes multi-isoformes. eXpress est également capable de résoudre des cartographies multiples de lectures entre familles de gènes et ne nécessite pas de génome de référence, ce qui le rend utilisable avec des assembleurs de novo (depuis le début) comme Trinity, Oases ou Trans-ABySS. Le modèle sous-jacent est basé sur des modèles probabilistes décrits précédemment et développés pour le séquençage ARN, mais est également applicable dans d'autres situations où les séquences cibles sont échantillonnées. Il inclut des paramètres sur les distributions de longueurs de fragment, les erreurs de lecture et les erreurs systématiques de fragments spécifiques à la séquence.

eXpress peut servir à résoudre des cartographies ambigües dans d'autres applications basées sur le séquençage à haut débit. Les seules entrées nécessaires sont l'ensemble des séquences cibles et un ensemble de fragments de séquences multi-alignements avec elles. Bien que ces séquences cibles soient souvent des gènes isoformes, il n’est pas nécessaire qu'elles le soient. Les haplotypes peuvent être utilisés comme des références pour l'analyse d'expressions spécifiques aux allèles, liant des régions pour le séquençage ChIP, ou des génomes cibles dans les expériences métagénomiques. eXpress est utile pour toutes les analyses dans lesquelles se trouvent des lectures multi-map de séquences qui diffèrent en abondance.

Please cite: Adam Roberts and Lior Pachter: Streaming fragment assignment for real-time analysis of sequencing experiments. (PubMed) Nature Methods 10(1):71–73 (2013)
Registry entries: SciCrunch  Bioconda 
bio-rainbow
partitionnement et assemblage de séquences courtes pour la bio-informatique
Versions of package bio-rainbow
ReleaseVersionArchitectures
sid2.0.4+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch2.0.4-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster2.0.4+dfsg-1amd64,arm64,armhf,i386
bullseye2.0.4+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.0.4+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie2.0.4+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 3 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Ce paquet fournit un outil efficace de partitionnement et d'assemblage de séquences courtes (« short reads »), en particulier pour le séquençage RAD.

Rainbow est développé pour fournir une solution ultra rapide et efficace en utilisation de la mémoire pour regrouper et assembler les séquences courtes produites par le séquençage RAD. Tout d’abord, Rainbow regroupe les séquences en utilisant une méthode de graine espacée. Ensuite, Rainbow implémente une stratégie d’appel hétérozygote pour diviser les groupes potentiels en haplotypes de manière descendante. Le long d’un arbre guidé, il fusionne itérativement les feuilles sœurs de façon ascendante si elles sont suffisamment similaires. Ici, la similarité est définie en comparant les secondes séquences d’un segment RAD. Cette approche tente de détruire les hétérozygotes tout en discriminant les séquences répétitives. Enfin, Rainbow utilise un algorithme glouton pour assembler localement les séquences fusionnées en contigs. En se basant sur la simulation et de vraies données de séquençage RAD de guppy, il a été montré que Rainbow est plus efficace que les autres outils pour gérer les données de séquençage RAD.

Please cite: Zechen Chong, Jue Ruan and Chung-I. Wu: Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.. (PubMed) Bioinformatics 28(21):2732-2737 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
blasr
correspondance de lectures de séquençage de molécules simples
Versions of package blasr
ReleaseVersionArchitectures
buster5.3.2+dfsg-1.1amd64,arm64
bullseye5.3.3+dfsg-5amd64,arm64,mips64el,ppc64el
bookworm5.3.5+dfsg-6amd64,arm64,mips64el,ppc64el
sid5.3.5+dfsg-6amd64,arm64,mips64el,ppc64el,riscv64
stretch5.3+0-1amd64,arm64,mips64el,ppc64el
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

BLASR (« Basic local alignment with successive refinement ») est une méthode pour mettre en correspondance des lectures de séquençage de molécules simples avec un génome de référence. De telles lectures sont longues de plusieurs milliers de bases et les divergences entre elles et le génome sont dominées par les erreurs d'insertion et de suppression.

Registry entries: Bio.tools  SciCrunch  Bioconda 
bowtie
aligneur ultra-rapide de lectures courtes avec une faible empreinte mémoire
Versions of package bowtie
ReleaseVersionArchitectures
stretch1.1.2-6amd64,arm64,mips64el,ppc64el,s390x
jessie1.1.1-2amd64
trixie1.3.1-3amd64,arm64,mips64el,ppc64el,riscv64,s390x
sid1.3.1-3amd64,arm64,mips64el,ppc64el,riscv64,s390x
bullseye1.3.0+dfsg1-1amd64,arm64,mips64el,ppc64el,s390x
bookworm1.3.1-1amd64,arm64,mips64el,ppc64el,s390x
buster1.2.2+dfsg-4amd64,arm64
Debtags of package bowtie:
biologynuceleic-acids
fieldbiology:bioinformatics
interfacecommandline
roleprogram
sciencecalculation
scopeutility
useanalysing, comparing
works-withbiological-sequence
Popcon: 27 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Ce paquet aborde le problème de l'interprétation des résultats des dernières technologies (2010) de séquençage de l'ADN. Celles-ci produiront des étirements assez courts et qui ne peuvent pas directement être interprétés. C'est le défi d'outils comme Bowtie de donner un emplacement chromosomique aux courts étirements de l'ADN séquencé à chaque exécution.

Bowtie aligne des (lectures de) séquences d'ADN du génome humain à un débit de 25 millions de lectures de paires de bases 35 par heure. Le paquet Bowtie indexe le génome humain avec un indice de Burrows-Wheeler pour garder une faible empreinte mémoire : typiquement environ 2,2 Go pour le génome humain (2,9 Go pour le séquençage à partir des deux extrémités).

The package is enhanced by the following packages: bowtie-examples multiqc
Please cite: Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. (eprint) Genome Biology 10:R25 (2009)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Genomics
bowtie2
aligneur ultra-rapide de lectures courtes avec une faible empreinte mémoire
Versions of package bowtie2
ReleaseVersionArchitectures
bookworm2.5.0-3amd64,arm64,mips64el,ppc64el
jessie2.2.4-1amd64
stretch2.3.0-2amd64
buster2.3.4.3-1amd64
bullseye2.4.2-2amd64,arm64,mips64el,ppc64el
trixie2.5.4-1amd64,arm64,mips64el,ppc64el,riscv64
sid2.5.4-1amd64,arm64,mips64el,ppc64el,riscv64
Popcon: 30 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

Ce paquet est un outil à faible empreinte mémoire et ultra-rapide pour l'alignement de lectures de séquençage à séquences longues de référence. Il est particulièrement performant pour l'alignement de lectures d'environ 50 symboles et jusqu'à des centaines ou des milliers de symboles, et particulièrement performant pour l'alignement de génomes relativement longs (par exemple, le génome de mammifères).

Bowtie 2 indexe le génome avec un FM-index pour garder une faible empreinte mémoire : pour le génome humain, son empreinte en mémoire vive est typiquement d'environ 3,2 Go. Bowtie 2 gère les modes d'alignement écarté, local, et par paire.

The package is enhanced by the following packages: bowtie2-examples multiqc
Please cite: Ben Langmead and Steven L Salzberg: Fast gapped-read alignment with Bowtie 2. (PubMed) Nature Methods 9:357–359 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Genomics
bwa
dispositif d'alignement Burrows-Wheeler
Versions of package bwa
ReleaseVersionArchitectures
stretch-backports0.7.17-1~bpo9+1amd64
stretch0.7.15-2+deb9u1amd64
jessie0.7.10-1amd64
bookworm0.7.17-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.7.17-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.7.18-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.7.18-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster0.7.17-3amd64
Debtags of package bwa:
biologynuceleic-acids, peptidic
fieldbiology, biology:bioinformatics
interfacecommandline, text-mode
roleprogram
useanalysing, comparing
Popcon: 19 users (23 upd.)*
Versions and Archs
License: DFSG free
Git

BWA est un paquet logiciel pour mettre en correspondance des séquences à faible divergence avec un grand génome de référence tel que le génome humain. Il est constitué de trois algorithmes : BWA-backtrack, BWA-SW et BWA-MEM. Le premier algorithme est conçu pour les lectures de séquences Illumina jusqu'à 100 paires de bases, alors que les deux autres sont conçus pour des lectures de séquences allant de 70 à 1 000 paires de bases. BWA-MEM et BWA-SW partagent des fonctionnalités communes telles que la prise en charge des lectures longues et des alignements de découpages (« split alignment »), mais BWA-MEM, plus récent, est généralement recommandé pour les requêtes de haute qualité, car il est plus rapide et plus précis. BWA-MEM a également de meilleures performances que BWA-backtrack pour les lectures Illumina de 70 à 100 paires de bases.

Please cite: Heng Li and Richard Durbin: Fast and accurate short read alignment with Burrows-Wheeler transform. (PubMed,eprint) Bioinformatics 25(14):1754-1760 (2009)
Registry entries: Bio.tools  SciCrunch  Bioconda 
canu
assembleur de séquences de molécules simples pour les génomes
Versions of package canu
ReleaseVersionArchitectures
buster1.8+dfsg-2amd64
bullseye2.0+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.0+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2+dfsg-5amd64,arm64,mips64el,ppc64el,riscv64,s390x
stretch-backports1.7.1+dfsg-1~bpo9+1amd64
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Canu est un fork de Celera Assembler, conçu pour le séquençage fortement bruité de molécules simples (comme PacBio RS II ou Oxford Nanopore MinION).

Canu est un processus d'assemblage hiérarchique fonctionnant en quatre étapes :

 – détection des chevauchements dans les séquences très bruitées grâce à
   MHAP ;
 – génération du consensus de séquence corrigé ;
 – élagage des séquences corrigées ;
 – assemblage des séquences corrigées élaguées.
Please cite: Sergey Koren, Brian P. Walenz, Konstantin Berlin, Jason R. Miller and Adam M. Phillippy: Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.. Genome Res. (2017)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Remark of Debian Med team: Genome assembly and large-scale genome alignment (http://www.cbcb.umd.edu/software/)
changeo
boîte à outils d’affectation clonale de répertoire – Python 3
Versions of package changeo
ReleaseVersionArchitectures
bullseye1.0.2-1all
bookworm1.3.0-1all
buster0.4.5-1all
trixie1.3.0-1all
sid1.3.0-1all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Change-O est un ensemble d’outils pour traiter la sortie d’outils d’alignement V(D)J, affecter des groupes clonaux à des séquences d’immunoglobuline (Ig) et reconstruire des séquences de lignées germinales.

D’importantes améliorations dans les technologies de séquençage à haut débit permettent maintenant de caractériser à grande échelle des répertoires Ig, définis comme l’ensemble de protéines réceptrices d’antigènes transmembranaires situées à la surface des cellules B et des cellules T. Change-O est une suite d’outils permettant de faciliter l’analyse avancée de séquences Ig et TCR suivant l’affectation de segments de lignées germinales. Change-O gère la sortie de IMGT/HighV-QUEST et d’IgBLAST et fournit une grande variété de méthodes de regroupement (clustering) pour affecter des groupes clonaux à des séquences Ig. Le tri d’enregistrements, le groupement et diverses opérations de manipulations de bases de données sont également inclus.

Ce paquet installe la bibliothèque pour Python 3.

Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Link to publication (PubMed,eprint) Bioinformatics 31(20):3356-3358 (2015)
Registry entries: Bioconda 
crac
integrated RNA-Seq read analysis
Versions of package crac
ReleaseVersionArchitectures
buster2.5.0+dfsg-3amd64,arm64
sid2.5.2+dfsg-6amd64,arm64,mips64el,ppc64el,riscv64
bullseye2.5.2+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el
bookworm2.5.2+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el
trixie2.5.2+dfsg-6amd64,arm64,mips64el,ppc64el,riscv64
stretch2.5.0+dfsg-1amd64
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

CRAC is a tool to analyze High Throughput Sequencing (HTS) data in comparison to a reference genome. It is intended for transcriptomic and genomic sequencing reads. More precisely, with transcriptomic reads as input, it predicts point mutations, indels, splice junction, and chimeric RNAs (ie, non colinear splice junctions). CRAC can also output positions and nature of sequence error that it detects in the reads. CRAC uses a genome index. This index must be computed before running the read analysis. For this sake, use the command "crac-index" on your genome files. You can then process the reads using the command crac. See the man page of CRAC (help file) by typing "man crac". CRAC requires large amount of main memory on your computer. For processing against the Human genome, say 50 million reads of 100 nucleotide each, CRAC requires about 40 gigabytes of main memory. Check whether the system of your computing server is equipped with sufficient amount of memory before launching an analysis.

Please cite: Eliseos J. Mucaki, Natasha G. Caminsky, Ami M. Perri, Ruipeng Lu, Alain Laederach, Matthew Halvorsen, Joan H. M. Knoll and Peter K. Rogan: A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer. (PubMed) BMS Medical Genomics 9:19 (2016)
Registry entries: Bio.tools  SciCrunch 
cutadapt
Clean biological sequences from high-throughput sequencing reads
Versions of package cutadapt
ReleaseVersionArchitectures
bullseye3.2-2all
bookworm4.2-1all
stretch1.12-2all
buster1.18-1all
trixie4.7-2all
sid4.7-2all
upstream4.9
Popcon: 13 users (3 upd.)*
Newer upstream!
License: DFSG free
Git

Cutadapt helps with biological sequence clean tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter reads in various ways. Adapter sequences can contain IUPAC wildcard characters. Also, paired-end reads and even colorspace data is supported. If you want, you can also just demultiplex your input data, without removing adapter sequences at all.

This package contains the user interface.

The package is enhanced by the following packages: multiqc
Please cite: Marcel Martin: Cutadapt removes adapter sequences from high-throughput sequencing reads. (eprint) EMBnet.journal 17(1):10-12 (2015)
Registry entries: Bio.tools  SciCrunch  Bioconda 
daligner
alignements locaux parmi des lectures longues de séquençages de nucléotide
Versions of package daligner
ReleaseVersionArchitectures
stretch1.0+20161119-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.0+git20240119.335105d-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.0+git20240119.335105d-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.0+git20221215.bd26967-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.0+git20200727.ed40ce5-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.0+git20180524.fd21879-1amd64,arm64,armhf,i386
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Ces outils permettent de trouver tous les alignements locaux significatifs parmi des lectures encodées dans une base de données Dazzler. L’hypothèse est que les lectures sont originaires d’un séquenceur à lectures longues RS II de Pacific Biosciences. C'est-à-dire que les lectures sont longues et bruitées, jusqu’à 15 % en moyenne.

Please cite: Gene Myers: Efficient Local Alignment Discovery amongst Noisy Long Reads. 8701:52-67 (2014)
Registry entries: SciCrunch  Bioconda 
deepnano
alternative basecaller for MinION reads of genomic sequences
Versions of package deepnano
ReleaseVersionArchitectures
buster0.0+git20170813.e8a621e-3amd64,arm64,i386
bullseye0.0+git20170813.e8a621e-3.1amd64,arm64,armhf,i386,ppc64el,s390x
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

DeepNano is alternative basecaller for Oxford Nanopore MinION reads based on deep recurrent neural networks.

Currently it works with SQK-MAP-006 and SQK-MAP-005 chemistry and as a postprocessor for Metrichor.

Please cite: Vladimír Boža, Broňa Brejová and Tomáš Vinař: DeepNano: Deep recurrent neural networks for base calling in MinION nanopore reads. PLOS one (2017)
discosnp
détection du polymorphisme d'un seul nucléotide pour des ensembles bruts de lectures
Versions of package discosnp
ReleaseVersionArchitectures
buster2.3.0-2amd64,arm64,i386
jessie1.2.5-1amd64,armel,armhf,i386
stretch1.2.6-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.6.2-3amd64,arm64,mips64el,ppc64el,riscv64
bookworm2.6.2-2amd64,arm64,mips64el,ppc64el
trixie2.6.2-3amd64,arm64,mips64el,ppc64el,riscv64
bullseye4.4.4-1amd64,arm64,i386,mips64el,ppc64el,s390x
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Software discoSnp is designed for discovering Single Nucleotide Polymorphism (SNP) from raw set(s) of reads obtained with Next Generation Sequencers (NGS).

Note that number of input read sets is not constrained, it can be one, two, or more. Note also that no other data as reference genome or annotations are needed.

The software is composed by two modules. First module, kissnp2, detects SNPs from read sets. A second module, kissreads, enhance the kissnp2 results by computing per read set and for each found SNP:

 1) its mean read coverage
 2) the (phred) quality of reads generating the polymorphism.

This program is superseded by DiscoSnp++.

Registry entries: Bio.tools  SciCrunch  Bioconda 
dnaclust
outil de regroupement de millions de séquences courtes d’ADN
Versions of package dnaclust
ReleaseVersionArchitectures
jessie3-2amd64,armel,armhf,i386
sid3-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie3-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster3-6amd64,arm64,armhf,i386
stretch3-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 3 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Dnaclust est un outil pour grouper un grand nombre de séquences courtes d’ADN. Les regroupements sont créés de telle façon que le « rayon » de chaque regroupement est inférieur au seuil précisé.

Les séquences entrées pour être regroupées doivent être au format Fasta. L’identifiant de chaque séquence est basé sur le premier mot de la séquence au format Fasta. Le premier mot est le préfixe de l’en-tête jusqu’à la première occurrence d’une espace dans l’en-tête.

Please cite: Mohammadreza Ghodsi, Bo Liu and Mihai Pop: DNACLUST: accurate and efficient clustering of phylogenetic marker genes. (PubMed,eprint) BMC Bioinformatics 12:271 (2011)
Registry entries: Bio.tools  SciCrunch 
dwgsim
simulateur de lectures courtes de séquençage
Versions of package dwgsim
ReleaseVersionArchitectures
sid0.1.14-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm0.1.14-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie0.1.14-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch0.1.11-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0.1.12-2amd64,arm64,armhf
bullseye0.1.12-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

DWGSIM simule les lectures courtes de séquençage des plateformes de séquençage modernes. DWGSIM génère les taux d’erreurs de bases en utilisant un modèle paramétrique, permettant d’obtenir un profil d’erreurs plus réaliste. Il a été développé originellement pour une évaluation des alignements de lectures courtes.

Registry entries: SciCrunch  Bioconda 
ea-utils
command-line tools for processing biological sequencing data
Versions of package ea-utils
ReleaseVersionArchitectures
stretch1.1.2+dfsg-4amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.1.2+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.1.2+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bookworm1.1.2+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1.2+dfsg-5amd64,arm64,armhf,i386
bullseye1.1.2+dfsg-6amd64,arm64,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 11 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Ea-utils provides a set of command-line tools for processing biological sequencing data, barcode demultiplexing, adapter trimming, etc.

Primarily written to support an Illumina based pipeline - but should work with any FASTQs.

Main Tools are:

  • fastq-mcf Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.

  • fastq-multx Demultiplexes a fastq. Capable of auto-determining barcode id's based on a master set fields. Keeps multiple reads in-sync during demultiplexing. Can verify that the reads are in-sync as well, and fail if they're not.

  • fastq-join Similar to audy's stitch program, but in C, more efficient and supports some automatic benchmarking and tuning. It uses the same "squared distance for anchored alignment" as other tools.

  • varcall Takes a pileup and calculates variants in a more easily parameterized manner than some other tools.

Please cite: Erik Aronesty: Comparison of Sequencing Utility Programs. (eprint) The Open Bioinformatics Journal 7:1-8 (2013)
Registry entries: Bio.tools  SciCrunch 
fastaq
outils de manipulation de fichiers FASTA ou FASTQ
Versions of package fastaq
ReleaseVersionArchitectures
bookworm3.17.0-5all
trixie3.17.0-6all
stretch3.14.0-1all
bullseye3.17.0-3all
buster3.17.0-2all
sid3.17.0-6all
jessie1.5.0-1all
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Fastaq est une collection variée de scripts qui réalisent des tâches utiles et courantes de manipulation pour FASTA et FASTQ, telles que le filtrage, la fusion, le fractionnement, le découpage, la recherche ou le remplacement, etc. Les fichiers d’entrée et de sortie peuvent être compressés (le format est automatiquement détecté) et les différentes commandes de Fastaq peuvent être redirigées (pipe).

Topics: Bioinformatics
fastp
Ultra-fast all-in-one FASTQ preprocessor
Versions of package fastp
ReleaseVersionArchitectures
bullseye0.20.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie0.23.4+dfsg-1amd64,arm64,armel,armhf,mips64el,ppc64el,riscv64,s390x
sid0.23.4+dfsg-1amd64,arm64,armel,armhf,mips64el,ppc64el,riscv64,s390x
buster0.19.6+dfsg-1amd64,arm64,armhf,i386
bookworm0.23.2+dfsg-2amd64,arm64,armel,armhf,mips64el,mipsel,ppc64el,s390x
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

All-in-one FASTQ preprocessor, fastp provides functions including quality profiling, adapter trimming, read filtering and base correction. It supports both single-end and paired-end short read data and also provides basic support for long-read data.

The package is enhanced by the following packages: multiqc
Please cite: Shifu Chen, Yanqing Zhou, Yaru Chen and Jia Gu: fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17):i884-i890 (2018)
Registry entries: Bioconda 
fastqc
contrôle qualité pour les données de séquences à haut débit
Versions of package fastqc
ReleaseVersionArchitectures
bullseye0.11.9+dfsg-4all
trixie0.12.1+dfsg-4all
sid0.12.1+dfsg-4all
jessie0.11.2+dfsg-3all
stretch0.11.5+dfsg-6all
buster0.11.8+dfsg-2all
bookworm0.11.9+dfsg-6all
Popcon: 34 users (13 upd.)*
Versions and Archs
License: DFSG free
Git

FastQC a pour objectif de fournir un moyen simple de faire des contrôles de qualité sur des données de séquences brutes provenant de pipelines de séquençage à haut débit. Il propose un ensemble modulaire d'analyses pouvant être utilisées pour donner une impression rapide des problèmes dont l'utilisateur devrait être au courant avant de faire une analyse plus poussée.

Les principales fonctions de FastQC sont :

  • import des données des fichiers BAM (version binaire compressée de SAM), SAM (« Sequence Alignment/Map ») ou FastQ (n'importe quelle variante) ;
  • fournir un aperçu rapide pour indiquer les zones dans lesquelles il peut y avoir des problèmes ;
  • des graphes de résumé et des tables pour évaluer rapidement les données ;
  • export des résultats vers un rapport permanent en HTML ;
  • fonctionnement hors ligne afin de permettre la génération de rapports automatiques sans exécuter l'application interactive.
The package is enhanced by the following packages: multiqc
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Sequencing
flexbar
code-barres flexible et suppression de l'adaptation pour les plateformes de séquençage
Versions of package flexbar
ReleaseVersionArchitectures
buster3.4.0-2amd64,arm64,armhf,i386
stretch2.50-2amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el
jessie2.50-1amd64,armhf,i386
sid3.5.0-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie3.5.0-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm3.5.0-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye3.5.0-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Le paquet Flexbar prétraite les données de séquençage efficacement et à haut débit. Il démultiplexe les lectures de code-barres et enlève les séquences d'adaptation. De plus, les fonctionnalités de découpage et de filtrage sont fournies. Le paquet Flexbar augmente les taux de correspondance et améliore les assemblages du génome et du transcriptome. Il gère les données de séquençage de dernière génération aux formats FASTA/Q et CSFASTA/Q des plateformes de l'entreprise Illumina, 454 de l'entreprise Roche et SOLiD de l'entreprise AppliedBIosystems.

Les noms de paramètres ont changé dans le paquet Flexbar. Veuillez donc vérifier vos scripts. Ces derniers mois, les paramètres par défaut ont été optimisés, plusieurs bogues ont été réparés, et diverses améliorations ont été effectuées, par exemple une interface en ligne de commande rénovée, de nouveaux modes de découpage ainsi que des exigences plus faibles en temps et en mémoire.

The package is enhanced by the following packages: multiqc
Please cite: Matthias Dodt, Johannes T. Roehr, Rina Ahmed and Christoph Dieterich: FLEXBAR — Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms. (eprint) Biology 1(3):895-905 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
fml-asm
outil pour l’assemblage Illumina de lectures courtes pour de petites régions
Versions of package fml-asm
ReleaseVersionArchitectures
trixie0.1+git20190320.b499514-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid0.1+git20190320.b499514-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
experimental0.1+git20190320.b499514-2~0expamd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch0.1-2amd64
stretch-backports0.1-4~bpo9+1amd64
buster0.1-5amd64
bullseye0.1+git20190320.b499514-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm0.1+git20190320.b499514-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
upstream0.1+git20221215.85f159e
Popcon: 2 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

Fml-asm est un outil en ligne de commande pour l’assemblage Illumina de lectures courtes pour des régions de taille 100 à 10 millions de paires de base, basé sur la bibliothèque fermi-lite. C’est une version en mémoire largement plus légère de fermikit sans génération de fichiers intermédiaires. Il hérite des performances, de l’empreinte mémoire relativement petite et des fonctions de fermikit. En particulier, fermi-lite est capable de conserver les évènements hétérozygotes et peut donc être utilisée pour assembler des régions diploïdes dans le but de détecter les variants.

Registry entries: Bio.tools  SciCrunch  Bioconda 
fsm-lite
frequency-based string mining (lite)
Versions of package fsm-lite
ReleaseVersionArchitectures
trixie1.0-8amd64,arm64,mips64el,ppc64el,riscv64,s390x
bookworm1.0-8amd64,arm64,mips64el,ppc64el,s390x
stretch1.0-2amd64,arm64,mips64el,ppc64el,s390x
buster1.0-3amd64,arm64
sid1.0-8amd64,arm64,mips64el,ppc64el,riscv64,s390x
bullseye1.0-5amd64,arm64,mips64el,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

A singe-core implementation of frequency-based substring mining used in bioinformatics to extract substrings that discriminate two (or more) datasets inside high-throughput sequencing data.

Registry entries: SciCrunch  Bioconda 
giira
RNA-Seq driven gene finding incorporating ambiguous reads
Versions of package giira
ReleaseVersionArchitectures
jessie0.0.20140210-2amd64
buster0.0.20140625-2amd64
stretch0.0.20140625-1amd64
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

GIIRA is a gene prediction method that identifies potential coding regions exclusively based on the mapping of reads from an RNA-Seq experiment. It was foremost designed for prokaryotic gene prediction and is able to resolve genes within the expressed region of an operon. However, it is also applicable to eukaryotes and predicts exon intron structures as well as alternative isoforms.

Please cite: Franziska Zickmann, Martin S. Lindner and Bernhard Y. Renard: GIIRA—RNA-Seq driven gene finding incorporating ambiguous reads. (PubMed,eprint) Bioinformatics (2013)
Registry entries: Bio.tools  SciCrunch 
grinder
simulateur polyvalent de lecture de séquençage global et d'amplicon omiques
Versions of package grinder
ReleaseVersionArchitectures
bullseye0.5.4-6all
trixie0.5.4-6all
bookworm0.5.4-6all
sid0.5.4-6all
jessie0.5.3-3all
buster0.5.4-5all
stretch0.5.4-1all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Grinder est un programme polyvalent pour créer des bibliothèques de séquences aléatoires globales et d'amplicons basées sur des séquences de référence d'ADN, d'ARN ou de protéines fournies dans un fichier FASTA.

Grinder peut produire des ensembles de données globales et d'amplicons génomiques, méta-génomiques, transcriptomiques, méta-transcriptomiques, protéomiques, méta-protéomiques à partir de technologies de séquençage actuelles comme Sanger, 454, Illumina. Ces ensembles de données simulés peuvent être utilisés pour tester la précision d'outils bio-informatiques sous des hypothèses spécifiques, par exemple sans ou avec erreurs de séquençage, ou avec une grande ou faible diversité de communauté. Grinder peut aussi être utilisé pour aider à choisir entre des méthodes de séquençage alternatives pour un projet basé sur séquences, par exemple, si la banque doit être à extrémités appariées ou non, ou combien de lectures doivent être séquencées.

Please cite: Florent E. Angly, Dana Willner, Forest Rohwer, Philip Hugenholtz and Gene W. Tyson: Grinder: a versatile amplicon and shotgun sequence simulator. (PubMed,eprint) Nucleic Acids Research Epub ahead of print (2012)
Registry entries: SciCrunch 
hilive
alignement en temps réel des lectures Illumina
Versions of package hilive
ReleaseVersionArchitectures
stretch0.3-2amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
bullseye2.0a-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.0a-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.0a-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster1.1-2amd64,arm64,armhf
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

HiLive est un outil de mappage de lectures qui mappe les lectures d’HiSeq d’Illumina (ou comparables) à une référence de génome au moment où elles sont faites. Cela signifie que le mappage de lectures se termine aussitôt que le séquenceur a fini de générer des données.

Please cite: Martin S. Lindner, Benjamin Strauch, Jakob M. Schulze, Simon H. Tausch, Piotr W. Dabrowski, Andreas Nitsche and Bernhard Y. Renard: HiLive: real-time mapping of illumina reads while sequencing. (PubMed) Bioinformatics 33(6):917-919 (2017)
hinge
assembleur de lectures longues de génome basé sur des « pivots »
Versions of package hinge
ReleaseVersionArchitectures
bullseye0.5.0-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
buster0.5.0-4amd64,arm64
sid0.5.0-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
trixie0.5.0-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
bookworm0.5.0-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

HINGE est un « assembleur » pour le génome qui cherche à réaliser une résolution optimale des répétitions en distinguant les répétitions qui peuvent être résolues pour des données de celles qui ne le peuvent pas. Cela est obtenu en ajoutant des « charnières » (hinge) aux lectures pour construire un graphe de chevauchements où les répétitions non résolues sont fusionnées. Au final, HINGE combine la résilience aux erreurs des assembleurs basés sur les chevauchements avec les capacités de résolution de répétitions des assembleurs basés sur les graphes de de Bruijn.

Please cite: Govinda M Kamath, Ilan Shomorony, Fei Xia, Thomas Courtade and David N Tse: HINGE: Long-read assembly achieves optimal repeat resolution. (PubMed,eprint) Genome Research (2017)
hisat2
graph-based alignment of short nucleotide reads to many genomes
Versions of package hisat2
ReleaseVersionArchitectures
stretch2.0.5-1amd64
sid2.2.1-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye2.2.1-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.2.1-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie2.2.1-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster2.1.0-2amd64
Popcon: 11 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as against a single reference genome). Based on an extension of BWT for graphs a graph FM index (GFM) was designed and implementd. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).

The package is enhanced by the following packages: multiqc
Please cite: Daehwan Kim, Joseph M. Paggi, Chanhee Park, Christopher Bennett and Steven L. Salzberg: Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37(8):907-915 (2019)
Registry entries: Bio.tools  SciCrunch  Bioconda 
idba
iterative De Bruijn Graph short read assemblers
Versions of package idba
ReleaseVersionArchitectures
trixie1.1.3-8amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid1.1.3-8amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.1.3-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.1.3-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.1.3-3amd64,arm64,armhf,i386
stretch1.1.3-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.1.2-1amd64,armel,armhf,i386
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

IDBA stands for iterative de Bruijn graph assembler. In computational sequence biology, an assembler solves the puzzle coming from large sequencing machines that feature many gigabytes of short reads from a large genome.

This package provides several flavours of the IDBA assembler, as they all share the same source tree but serve different purposes and evolved over time.

IDBA is the basic iterative de Bruijn graph assembler for second-generation sequencing reads. IDBA-UD, an extension of IDBA, is designed to utilize paired-end reads to assemble low-depth regions and use progressive depth on contigs to reduce errors in high-depth regions. It is a generic purpose assembler and especially good for single-cell and metagenomic sequencing data. IDBA-Hybrid is another update version of IDBA-UD, which can make use of a similar reference genome to improve assembly result. IDBA-Tran is an iterative de Bruijn graph assembler for RNA-Seq data.

Please cite: Yu Peng, Henry C. M. Leung, S. M. Yiu and Francis Y. L. Chin: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. (PubMed,eprint) Bioinformatics 28(11):1420-1428 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
igdiscover
analyzes antibody repertoires to find new V genes
Versions of package igdiscover
ReleaseVersionArchitectures
sid0.11-4all
bullseye0.11-3all
upstream0.15.1
Popcon: 0 users (0 upd.)*
Newer upstream!
License: DFSG free
Git

IgDiscover analyzes antibody repertoires and discovers new V genes from high-throughput sequencing reads. Heavy chains, kappa and lambda light chains are supported (to discover VH, VK and VL genes).

Please cite: Martin M. Corcoran, Ganesh E. Phad, Néstor Vázquez Bernat, Christiane Stahl-Hennig, Noriyuki Sumida, Mats A.A. Persson, Marcel Martin and Gunilla B. Karlsson Hedestam: Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity.. (eprint) Nature Communications 7:13642 (2016)
Registry entries: Bio.tools  Bioconda 
igor
infers V(D)J recombination processes from sequencing data
Versions of package igor
ReleaseVersionArchitectures
trixie1.4.0+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.4.0+dfsg-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.3.0+dfsg-1amd64,arm64,armhf,i386
bullseye1.4.0+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.4.0+dfsg-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

IGoR (Inference and Generation of Repertoires) is a versatile software to analyze and model immune receptors generation, selection, mutation and all other processes.

Please cite: Quentin Marcou, Thierry Mora and Aleksandra M. Walczak: High-throughput immune repertoire analysis with IGoR. (PubMed,eprint) Nature Communications 9(1):561 (2018)
Registry entries: Bioconda 
igv
visualiseur génomique intégratif
Versions of package igv
ReleaseVersionArchitectures
bookworm2.16.0+dfsg-1all
stretch2.3.90+dfsg-1 (non-free)all
jessie2.3.38+dfsg-1 (non-free)all
trixie2.17.3+dfsg-1all
sid2.17.3+dfsg-1all
bullseye2.6.3+dfsg-3 (non-free)all
upstream2.18.4
Debtags of package igv:
fieldbiology
interfacex11
networkclient
roleprogram
scopeutility
useviewing
works-withbiological-sequence
Popcon: 19 users (2 upd.)*
Newer upstream!
License: DFSG free
Git

IGV (Integrative Genomics Viewer) est un afficheur très performant qui gère de grands ensembles de données hétérogènes, tout en fournissant une utilisation intuitive et sans problèmes à tous les niveaux de la résolution de génomes. Une caractéristique clé est sa préoccupation de la nature intégrative des études génomiques avec une prise en charge des données de séquençage basées sur les tableaux et de nouvelle génération, et l’intégration de données cliniques et de phénotypes. Bien que IGV soit souvent utilisé pour visualiser des données génomiques issues de données publiques, sa première utilité est d’aider les chercheurs voulant visualiser et explorer leurs propres données et celle de leurs collègues. Dans ce but, IGV gère de manière flexible des jeux de données locaux ou distants et il est optimisé pour offrir une visualisation et une exploration de hautes performance sur les systèmes de bureau standard.

Please cite: James T Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S Lander, Gad Getz and Jill P Mesirov: Integrative genomics viewer. (PubMed,eprint) Nature Biotechnology 29(1):24–26 (2011)
Registry entries: Bio.tools  SciCrunch  Bioconda 
iva
assemblage itératif de séquences virales
Versions of package iva
ReleaseVersionArchitectures
bookworm1.0.11+ds-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
sid1.0.11+ds-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
bullseye1.0.9+ds-11amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
stretch1.0.8+ds-1amd64,arm64,mips64el,ppc64el
buster1.0.9+ds-6amd64,arm64
trixie1.0.11+ds-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

IVA est un programme d’assemblage de novo conçu pour assembler des génomes de virus n’ayant pas de séquences répétées, utilisant des paires lues par Illumina séquencées à partir de populations mélangées à une profondeur extrêmement élevée.

L’algorithme principal d’IVA fonctionne en étendant de manière itérative des contigs utilisant des paires lues alignées. Son entrée peut être simplement des paires lues ou, en outre, un ensemble de contigs existants peut être fourni pour être étendu. Alternativement, il peut prendre des lectures avec une séquence de référence.

Please cite: M. Hunt, A. Gall, S. H. Ong, J. Brener, B. Ferns, P. Goulder, E. Nastouli, J. A. Keane, P. Kellam and T. D. Otto: IVA: accurate de novo assembly of RNA virus genomes. (PubMed) Bioinformatics 31(14):2374-2376 (2015)
Registry entries: Bio.tools  Bioconda 
khmer
comptage k-mer, filtrage et parcours de graphe de séquences d’ADN en mémoire vive
Versions of package khmer
ReleaseVersionArchitectures
bookworm3.0.0~a3+dfsg-4amd64
stretch2.0+dfsg-10amd64,arm64,mips64el,ppc64el
sid3.0.0~a3+dfsg-8amd64
trixie3.0.0~a3+dfsg-8amd64
buster2.1.2+dfsg-6amd64,arm64
bullseye2.1.2+dfsg-8amd64,arm64
experimental3.0.0~a3+dfsg-9~0expamd64
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Khmer est une bibliothèque et un ensemble d’outils en ligne de commande pour travailler sur le séquençage d’ADN. Il vise en premier lieu les données de séquençage « short read » telles que produites par la plateforme Illumina. Khmer utilise une approche basée sur k-mer pour l’analyse de séquences, d’où son nom.

Please cite: Michael R. Crusoe, Hussien F. Alameldin, Sherine Awad, Elmar Bucher, Adam Caldwell, Reed Cartwright, Amanda Charbonneau, Bede Constantinides, Greg Edvenson, Scott Fay, Jacob Fenton, Thomas Fenzl, Jordan Fish, Leonor Garcia-Gutierrez, Phillip Garland, Jonathan Gluck, Iván González, Sarah Guermond, Jiarong Guo, Aditi Gupta, Joshua R. Herr, Adina Howe, Alex Hyer, Andreas Härpfer, Luiz Irber, Rhys Kidd, David Lin, Justin Lippi, Tamer Mansour, Pamela McA'Nulty, Eric McDonald, Jessica Mizzi, Kevin D. Murray, Joshua R. Nahum, Kaben Nanlohy, Alexander Johan Nederbragt, Humberto Ortiz-Zuazaga, Jeramia Ory, Jason Pell, Charles Pepe-Ranney, Zachary N Russ, Erich Schwarz, Camille Scott, Josiah Seaman, Scott Sievert, Jared Simpson, Connor T. Skennerton, James Spencer, Ramakrishnan Srinivasan, Daniel Standage, James A. Stapleton, Joe Stein, Susan R Steinman, Benjamin Taylor, Will Trimble, Heather L. Wiencko, Michael Wright, Brian Wyss, Qingpeng Zhang, en zyme and C. Titus Brown: The khmer software package: enabling efficient sequence analysis. (2015)
Registry entries: Bio.tools  SciCrunch  Bioconda 
kissplice
détection de diverses sortes de polymorphismes dans les données du séquençage de l'ARN
Versions of package kissplice
ReleaseVersionArchitectures
jessie2.2.1-3amd64
stretch2.4.0-p1-1amd64,arm64,mips64el,ppc64el
buster2.4.0-p1-4amd64,arm64
sid2.6.7-1amd64,arm64,mips64el,ppc64el,riscv64
trixie2.6.7-1amd64,arm64,mips64el,ppc64el,riscv64
bookworm2.6.2-2amd64,arm64,mips64el,ppc64el
bullseye2.5.3-3amd64,arm64,mips64el,ppc64el
Debtags of package kissplice:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
useanalysing
works-withbiological-sequence
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

KisSplice est un logiciel qui permet l'analyse des données du séquençage de l'ARN avec ou sans génome de référence. C'est un assembleur de transcriptome local et exact, qui permet d'identifier les SNP (« Single-Nucleotide Polymorphism »), les insertions et les suppressions de bases azotées dans l'ADN d'un organisme et les événements d'épissage alternatif. Il peut traiter un nombre arbitraire de conditions biologiques, et quantifiera chaque type de données dans chaque condition biologique. Il a été testé sur des ensembles de données issus d'un système de séquençage de l'entreprise Illumina jusqu'à 1 milliards de lectures. Sa consommation en mémoire vive est d'environ 5 Go pour 100 millions de lectures.

Please cite: Gustavo AT Sacomoto, Janice Kielbassa, Rayan Chikhi, Raluca Uricaru, Pavlos Antoniou, Marie-France Sagot, Pierre Peterlongo and Vincent Lacroix: KISSPLICE: de-novo calling alternative splicing events from RNA-seq data. (PubMed,eprint) BMC Bioinformatics 13((Suppl 6)):S5 (2012)
Registry entries: SciCrunch  Bioconda 
Topics: RNA-seq; RNA splicing; Gene structure
kraken
assigning taxonomic labels to short DNA sequences
Versions of package kraken
ReleaseVersionArchitectures
stretch0.10.5~beta-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
stretch-backports1.1-2~bpo9+1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
buster1.1-3amd64,arm64,armhf,i386
bullseye1.1.1-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,s390x
bookworm1.1.1-4amd64,arm64,mips64el,ppc64el
trixie1.1.1-4amd64,arm64,mips64el,ppc64el,riscv64
sid1.1.1-4amd64,arm64,mips64el,ppc64el,riscv64
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

In its fastest mode of operation, for a simulated metagenome of 100 bp reads, Kraken processed over 4 million reads per minute on a single core, over 900 times faster than Megablast and over 11 times faster than the abundance estimation program MetaPhlAn. Kraken's accuracy is comparable with Megablast, with slightly lower sensitivity and very high precision.

The package is enhanced by the following packages: jellyfish1 multiqc
Please cite: Derrick E Wood and Steven L Salzberg: Kraken: ultrafast metagenomic sequence classification using exact alignments. (PubMed,eprint) Genome Biol. 15(3):R46 (2014)
Registry entries: Bio.tools  Bioconda 
kraken2
taxonomic classification system using exact k-mer matches
Versions of package kraken2
ReleaseVersionArchitectures
bookworm2.1.2-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye2.1.1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.1.3-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie2.1.3-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 6 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Kraken 2 is the newest version of Kraken, a taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. This classifier matches each k-mer within a query sequence to the lowest common ancestor (LCA) of all genomes containing the given k-mer. The k-mer assignments inform the classification algorithm. [see: Kraken 1's Webpage for more details].

Kraken 2 provides significant improvements to Kraken 1, with faster database build times, smaller database sizes, and faster classification speeds. These improvements were achieved by the following updates to the Kraken classification program:

 1. Storage of Minimizers: Instead of storing/querying entire k-mers,
    Kraken 2 stores minimizers (l-mers) of each k-mer. The length of
    each l-mer must be ≤ the k-mer length. Each k-mer is treated by
    Kraken 2 as if its LCA is the same as its minimizer's LCA.
 2. Introduction of Spaced Seeds: Kraken 2 also uses spaced seeds to
    store and query minimizers to improve classification accuracy.
 3. Database Structure: While Kraken 1 saved an indexed and sorted list
    of k-mer/LCA pairs, Kraken 2 uses a compact hash table. This hash
    table is a probabilistic data structure that allows for faster
    queries and lower memory requirements. However, this data structure
    does have a <1% chance of returning the incorrect LCA or returning
    an LCA for a non-inserted minimizer. Users can compensate for this
    possibility by using Kraken's confidence scoring thresholds.
 4. Protein Databases: Kraken 2 allows for databases built from amino
    acid sequences. When queried, Kraken 2 performs a six-frame
    translated search of the query sequences against the database.
 5. 16S Databases: Kraken 2 also provides support for databases not
    based on NCBI's taxonomy. Currently, these include the 16S
    databases: Greengenes, SILVA, and RDP.
Please cite: Derrick E Wood and Steven L Salzberg: Kraken: ultrafast metagenomic sequence classification using exact alignments. (PubMed,eprint) Genome Biol. 15(3):R46 (2014)
Registry entries: Bio.tools  Bioconda 
last-align
comparaison de séquences biologiques à l'échelle du génome
Versions of package last-align
ReleaseVersionArchitectures
sid1542-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1447-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie490-1amd64,armel,armhf,i386
stretch830-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1179-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie1542-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster963-2amd64,arm64,armhf,i386
Debtags of package last-align:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
roleprogram
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

LAST est un logiciel permettant de comparer et aligner des séquences, typiquement des séquences d'ADN ou de protéines. LAST est similaire à BLAST, mais il s'en sort mieux avec les grandes quantités de données de séquences. Voici deux choses pour lesquelles LAST est bon :

  • la comparaison de grands génomes (ceux de mammifères par exemple) ;
  • la mise en correspondance de nombreuses étiquettes de séquences avec un génome.

La principale innovation technique vient du fait que LAST trouve les correspondances initiales en se basant sur leur multiplicité au lieu d'utiliser une taille fixe (par exemple, BLAST utilise des 10-mers). Cela permet de faire correspondre des étiquettes aux génomes sans masques répétitifs, sans être débordé par les correspondances répétitives. Pour trouver ces correspondances de tailles variables, il utilise un tableau de suffixes inspiré de Vmatch. Afin d'avoir une grande sensibilité, LAST utilise a tableau de suffixes non contigu, similaires aux seeds espacées.

Please cite: Martin C. Frith, Raymond Wan and Paul Horton: Incorporating sequence quality data into alignment improves DNA read mapping. (PubMed,eprint) Nucl. Acids Res. 38(7):e100 (2010)
Registry entries: Bio.tools  SciCrunch  Bioconda 
libvcflib-tools
C++ library for parsing and manipulating VCF files (tools)
Versions of package libvcflib-tools
ReleaseVersionArchitectures
stretch1.0.0~rc1+dfsg1-3amd64
sid1.0.9+dfsg1-3amd64,arm64,mips64el,ppc64el,riscv64
trixie1.0.9+dfsg1-3amd64,arm64,mips64el,ppc64el,riscv64
bookworm1.0.3+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.0.2+dfsg-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster-backports1.0.1+dfsg-3~bpo10+1amd64
buster1.0.0~rc2+dfsg-2amd64
stretch-backports1.0.0~rc1+dfsg1-6~bpo9+1amd64
upstream1.0.10
Popcon: 1 users (3 upd.)*
Newer upstream!
License: DFSG free
Git

The Variant Call Format (VCF) is a flat-file, tab-delimited textual format intended to concisely describe reference-indexed variations between individuals. VCF provides a common interchange format for the description of variation in individuals and populations of samples, and has become the defacto standard reporting format for a wide array of genomic variant detectors.

vcflib provides methods to manipulate and interpret sequence variation as it can be described by VCF. It is both:

  • an API for parsing and operating on records of genomic variation as it can be described by the VCF format,
  • and a collection of command-line utilities for executing complex manipulations on VCF files.

This package contains several tools using the library.

macs
analyse basée sur modèles de ChIP-Seq venant de séquenceurs de lectures courtes
Versions of package macs
ReleaseVersionArchitectures
stretch2.1.1.20160309-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
trixie3.0.2-1amd64,arm64,armel,armhf,i386,ppc64el,riscv64,s390x
buster2.1.2.1-1amd64,arm64,armhf,i386
sid3.0.2-1amd64,arm64,armel,armhf,i386,ppc64el,riscv64,s390x
jessie2.0.9.1-1amd64,armel,armhf,i386
bullseye2.2.7.1-3amd64,arm64,armel,armhf,i386,ppc64el,s390x
bookworm2.2.7.1-6amd64,arm64,armel,armhf,i386,ppc64el,s390x
Popcon: 7 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

MACS modélise empiriquement la longueur des fragments ChIP séquencés, qui tend à être plus courte que les estimations de taille par sonication ou banque de construction, et l'utilise pour améliorer la résolution spatiale de sites de liaison prévisible. MACS utilise aussi une distribution de Poisson dynamique pour capturer efficacement des biais locaux dans la séquence du génome, permettant une prédiction plus sensible et robuste. MACS est tout à fait comparable aux algorithmes existants de recherche de pics dans les ChIP-Seq, est disponible publiquement en open source, et peut être utilisé pour des ChIP-Seq sans ou avec échantillons de contrôle.

Please cite: Yong Zhang, Tao Liu, Clifford A Meyer, Jérôme Eeckhoute, David S. Johnson, Bradley E. Bernstein, Chad Nussbaum, Richard M. Myers, Myles Brown, Wei Li and X Shirley Liu: Model-based Analysis of ChIP-Seq (MACS). (PubMed,eprint) Genome Biol. 9(9):R137 (2008)
Registry entries: Bio.tools  SciCrunch  Bioconda 
mapdamage
tracking and quantifying damage patterns in ancient DNA sequences
Versions of package mapdamage
ReleaseVersionArchitectures
sid2.2.2+dfsg-1all
trixie2.2.2+dfsg-1all
buster2.0.9+dfsg-1all
bookworm2.2.1+dfsg-3all
bullseye2.2.1+dfsg-1all
stretch2.0.6+dfsg-2all
Popcon: 7 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

MapDamage is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.

MapDamage is developed at the Centre for GeoGenetics by the Orlando Group.

Please cite: Hákon Jónsson, Aurélien Ginolhac, Mikkel Schubert and Philip Johnson and Ludovic Orlando: mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. (PubMed,eprint) Bioinformatics 29(13):1682-4 (2013)
Registry entries: SciCrunch  Bioconda 
mapsembler2
bioinformatics targeted assembly software
Versions of package mapsembler2
ReleaseVersionArchitectures
sid2.2.4+dfsg1-4amd64,arm64,ppc64el,s390x
bullseye2.2.4+dfsg1-3amd64,arm64,ppc64el,s390x
bookworm2.2.4+dfsg1-4amd64,arm64,ppc64el,s390x
stretch2.2.3+dfsg-3amd64,arm64,armel,armhf,i386,ppc64el,s390x
trixie2.2.4+dfsg1-4amd64,arm64,ppc64el,s390x
jessie2.1.6+dfsg-1amd64,armel,armhf,i386
buster2.2.4+dfsg-3amd64,arm64,armhf,i386
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Mapsembler2 is a targeted assembly software. It takes as input a set of NGS raw reads (fasta or fastq, gzipped or not) and a set of input sequences (starters).

It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler2 outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Mapsembler2 may be used for (not limited to):

  • Validate an assembled sequence (input as starter), e.g. from a de Bruijn graph assembly where read-coherence was not enforced.
  • Checks if a gene (input as starter) has an homolog in a set of reads
  • Checks if a known enzyme is present in a metagenomic NGS read set.
  • Enrich unmappable reads by extending them, possibly making them mappable
  • Checks what happens at the extremities of a contig
  • Remove contaminants or symbiont reads from a read set
Please cite: Pierre Peterlongo and Rayan Chikhi: Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. (PubMed) BMC Bioinformatics 13:48 (2012)
Registry entries: Bio.tools  Bioconda 
maq
correspondance de lectures de séquences d'ADN polymorphique de longueur courte à des séquences de référence
Versions of package maq
ReleaseVersionArchitectures
stretch0.7.1-7amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie0.7.1-5amd64,armel,armhf,i386
sid0.7.1-10amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.7.1-10amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm0.7.1-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye0.7.1-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.7.1-8amd64,arm64,armhf,i386
Debtags of package maq:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
scopeutility
useanalysing, comparing, searching
works-with-formatplaintext
Popcon: 9 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Maq (« Mapping and Assembly with Quality » − correspondance et assemblage de qualité) construit des associations de correspondance de lectures courtes générées par des machines de séquençage de nouvelles générations. Le logiciel a été particulièrement conçu pour l'analyseur de génome Solexa à 1 milliard de bases par exécution, par l'entreprise Illumina, et dispose d'une fonctionnalité préliminaire pour manipuler des données ABI SOLiD. Le logiciel Maq est déjà connu sous le nom de mapass2.

Le développement du logiciel Maq s'est arrêté en 2008. Ses successeurs sont BWA (« Burrows-Wheeler Aligner ») et SAMtools (« Sequence Alignment/Map tools »).

Please cite: Heng Li, Jue Ruan and Richard Durbin: Mapping short DNA sequencing reads and calling variants using mapping quality scores. (PubMed,eprint) Genome Research 18(11):1851-1858 (2008)
Registry entries: Bio.tools  SciCrunch 
maqview
graphical read alignment viewer for short gene sequences
Versions of package maqview
ReleaseVersionArchitectures
buster0.2.5-9amd64,arm64,armhf,i386
sid0.2.5-12amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.2.5-12amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm0.2.5-11amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.2.5-7amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye0.2.5-10amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie0.2.5-6amd64,armel,armhf,i386
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Maqview is graphical read alignment viewer. It is specifically designed for the Maq alignment file and allows you to see the mismatches, base qualities and mapping qualities. Maqview is nothing fancy as Consed or GAP, but just a simple viewer for you to see what happens in a particular region.

In comparison to tgap-maq, the text-based read alignment viewer written by James Bonfield, Maqview is faster and takes up much less memory and disk space in indexing. This is possibly because tgap aims to be a general-purpose viewer but Maqview fully makes use of the fact that a Maq alignment file has already been sorted. Maqview is also efficient in viewing and provides a command-line tool to quickly retrieve any region in an Maq alignment file.

Please cite: Heng Li, Jue Ruan and Richard Durbin: Mapping short DNA sequencing reads and calling variants using mapping quality scores. (PubMed,eprint) Genome Research 18(11):1851-1858 (2008)
Registry entries: Bio.tools  SciCrunch 
mhap
locality-sensitive hashing to detect long-read overlaps
Versions of package mhap
ReleaseVersionArchitectures
trixie2.1.3+dfsg-3all
stretch2.1.1+dfsg-1all
bullseye2.1.3+dfsg-3all
stretch-backports2.1.3+dfsg-1~bpo9+1all
bookworm2.1.3+dfsg-3all
sid2.1.3+dfsg-3all
buster2.1.3+dfsg-2all
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The MinHash Alignment Process (MHAP--pronounced MAP) is a reference implementation of a probabilistic sequence overlapping algorithm. Designed to efficiently detect all overlaps between noisy long-read sequence data. It efficiently estimates Jaccard similarity by compressing sequences to their representative fingerprints composed on min-mers (minimum k-mer).

Please cite: Konstantin Berlin, Sergey Koren, Chen-Shan Chin, James P Drake, Jane M Landolin and Adam M Phillippy: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. (PubMed) Nature Biotechnology 33(6):623–630 (2015)
Registry entries: Bioconda 
microbiomeutil
utilitaires d'analyse de microbiome
Versions of package microbiomeutil
ReleaseVersionArchitectures
trixie20101212+dfsg1-6all
bullseye20101212+dfsg1-4all
buster20101212+dfsg1-2all
bookworm20101212+dfsg1-5all
sid20101212+dfsg1-6all
jessie20101212+dfsg-1all
stretch20101212+dfsg1-1all
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Le paquet microbiomeutil est fourni avec les utilitaires suivants :

  • ChimeraSlayer : détection de chimères ;
  • NAST-iEr : outil d'alignement basé sur l'algorithme NAST ;
  • WigeoN : nouvelle implémentation de l'utilitaire Pintail de détection d'anomalies 16S ;
  • RESOURCES : séquences 16S de référence et alignements basés sur NAST que les outils ci-dessus exploitent.
Please cite: Brian J. Haas, Dirk Gevers, Ashlee M. Earl, Mike Feldgarden, Doyle V. Ward, Georgia Giannoukos, Dawn Ciulla, Diana Tabbaa, Sarah K. Highlander, Erica Sodergren, Barbara Methé, Todd Z. DeSantis, The Human Microbiome Consortium, Joseph F. Petrosino, Rob Knight and Bruce W. Birren: Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. (PubMed,eprint) Genome Research 21(3):494-504 (2011)
Registry entries: SciCrunch 
mira-assembler
assembleur de séquences génomiques exprimées utilisant la méthode globale
Versions of package mira-assembler
ReleaseVersionArchitectures
trixie4.9.6-11amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid4.9.6-11amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch4.9.6-2amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bookworm4.9.6-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie4.0.2-1amd64,armel,armhf,i386
bullseye4.9.6-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster4.9.6-4amd64,arm64,armhf,i386
Debtags of package mira-assembler:
roleprogram
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

L’assembleur de fragments de génome mira est un assembleur spécialisé pour les projets classés comme difficiles car concernant un nombre élevé de répétitions. Pour la transcription de marqueurs de séquence exprimée (EST), miraEST est spécialisé dans la reconstruction de transcriptions d’ARNm pures tout en détectant et classifiant les polymorphismes d'un seul nucléotide (SNP) se produisant dans celles-ci.

L’assembleur est systématiquement utilisé pour des tâches variées telles que la détection de mutation dans différents types de cellule, l’analyse de similarités de transcriptions de divers organismes, l’assemblage de séquences pures à partir de diverses sources pour la conception d’oligonucléotides dans les expérimentations cliniques de biopuces.

Ce paquet fournit les binaires (exécutables) suivants :

 — mira : assemblage de séquences génomiques ;
 — miramem : estimation de la mémoire nécessaire pour les projets
   d’assemblage ;
 – mirabait : outil de type « grep » pour choisir les lectures pour des
   k-mer jusqu’à 256 bases ;
 – miraconvert : outil pour convertir, extraire et parfois recalculer
   toutes sortes de données relatives aux fichiers d’assemblage de
   séquences.
Please cite: Bastien Chevreux, Thomas Pfisterer, Bernd Drescher, Albert J. Driesel, Werner E. G. Müller, Thomas Wetter and Sándor Suhai: Using the miraEST Assembler for Reliable and Automated mRNA Transcript Assembly and SNP Detection in Sequenced ESTs. (PubMed,eprint) Genome Research 14(6):1147-1159 (2004)
Registry entries: Bio.tools  SciCrunch  Bioconda 
mothur
ensemble pour analyse de séquences pour la recherche sur le microbiome
Versions of package mothur
ReleaseVersionArchitectures
stretch1.38.1.1-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.48.1-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.48.1-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.48.0-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.44.3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.41.21-1amd64,arm64,armhf,i386
jessie1.33.3+dfsg-2amd64,armel,armhf,i386
Debtags of package mothur:
roleprogram
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Mothur cherche à développer une application unique à source ouvert, extensible, remplissant tous les besoins les besoins de la bio-informatique pour la communauté de l’écologie microbienne. Sont intégrées, entre autres, les fonctions de DOTUR, SONS, TreeClimber, S-LibShuff et UniFrac. De plus pour améliorer la flexibilité de ces algorithmes, un certain nombre de fonctions ont été ajoutées telles que des outils de calcul ou d’affichage.

Please cite: Patrick D Schloss, Sarah L Westcott, Thomas Ryabin, Justine R Hall, Martin Hartmann, Emily B Hollister, Ryan A Lesniewski, Brian B Oakley, Donovan H Parks, Courtney J Robinson, Jason W Sahl, Blaz Stres, Gerhard G Thallinger, David J Van Horn and Carolyn F Weber: Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. (PubMed) Appl Environ Microbiol 75(23):7537-7541 (2009)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Microbial ecology
nanopolish
identification de consensus pour les données de séquençage par nanopore
Versions of package nanopolish
ReleaseVersionArchitectures
sid0.14.0-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
stretch-backports0.10.2-1~bpo9+1amd64
bullseye0.13.2-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.11.0-2amd64
bookworm0.14.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
stretch0.5.0-1amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
trixie0.14.0-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Nanopolish uses a signal-level hidden Markov model for consensus calling of nanopore genome sequencing data. It can perform signal-level analysis of Oxford Nanopore sequencing data. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.

Registry entries: Bio.tools  SciCrunch  Bioconda 
paleomix
pipelines and tools for the processing of ancient and modern HTS data
Versions of package paleomix
ReleaseVersionArchitectures
buster1.2.13.3-1amd64
bookworm1.3.7-3amd64,arm64
trixie1.3.8-1amd64,arm64
bullseye1.3.2-1amd64,arm64,mips64el,ppc64el
sid1.3.8-1amd64,arm64
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The PALEOMIX pipelines are a set of pipelines and tools designed to aid the rapid processing of High-Throughput Sequencing (HTS) data: The BAM pipeline processes de-multiplexed reads from one or more samples, through sequence processing and alignment, to generate BAM alignment files useful in downstream analyses; the Phylogenetic pipeline carries out genotyping and phylogenetic inference on BAM alignment files, either produced using the BAM pipeline or generated elsewhere; and the Zonkey pipeline carries out a suite of analyses on low coverage equine alignments, in order to detect the presence of F1-hybrids in archaeological assemblages. In addition, PALEOMIX aids in metagenomic analysis of the extracts.

The pipelines have been designed with ancient DNA (aDNA) in mind, and includes several features especially useful for the analyses of ancient samples, but can all be for the processing of modern samples, in order to ensure consistent data processing.

Please cite: Mikkel Schubert, Luca Ermini, Clio Der Sarkissian, Hákon Jónsson, Aurélien Ginolhac, Robert Schaefer, Michael D Martin, Ruth Fernández, Martin Kircher, Molly McCue, Eske Willerslev and Ludovic Orlando: Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. (PubMed) Nature Protocols 9(5):1056-82 (2014)
Registry entries: Bio.tools  SciCrunch 
pbhoney
genomic structural variation discovery
Versions of package pbhoney
ReleaseVersionArchitectures
stretch15.8.24+dfsg-2all
bullseye15.8.24+dfsg-7all
bookworm15.8.24+dfsg-7all
sid15.8.24+dfsg-7all
buster15.8.24+dfsg-3all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

PBHoney is part of the PBSuite.

pbjelly
genome assembly upgrading tool
Versions of package pbjelly
ReleaseVersionArchitectures
bookworm15.8.24+dfsg-7all
sid15.8.24+dfsg-7all
stretch15.8.24+dfsg-2all
buster15.8.24+dfsg-3all
bullseye15.8.24+dfsg-7all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

PBJelly is part of the PBSuite.

pbsuite
software for Pacific Biosciences sequencing data
Versions of package pbsuite
ReleaseVersionArchitectures
bookworm15.8.24+dfsg-7all
stretch15.8.24+dfsg-2all
bullseye15.8.24+dfsg-7all
sid15.8.24+dfsg-7all
buster15.8.24+dfsg-3all
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

The PBSuite contains two projects created for analysis of Pacific Biosciences long-read sequencing data.

  • PBJelly - genome upgrading tool
  • PBHoney - structural variation discovery
picard-tools
outils en ligne de commande pour manipuler les fichiers SAM et BAM
Versions of package picard-tools
ReleaseVersionArchitectures
bullseye2.24.1+dfsg-1all
buster2.18.25+dfsg-2amd64
bookworm2.27.5+dfsg-2all
trixie3.1.1+dfsg-1all
sid3.1.1+dfsg-1all
stretch2.8.1+dfsg-1all
jessie1.113-1all
upstream3.2.0
Popcon: 13 users (7 upd.)*
Newer upstream!
License: DFSG free
Git

Le format « Sequence Alignment/Map » (SAM) est un format générique pour le stockage d'alignements de grandes séquences de nucléotides. Le logiciel Picard Tools fournit ces utilitaires pour manipuler les fichiers SAM et BAM (Binary Alignment/Map) :

  AddCommentsToBam                  FifoBuffer
  AddOrReplaceReadGroups            FilterSamReads
  BaitDesigner                      FilterVcf
  BamIndexStats                     FixMateInformation
                                    GatherBamFiles
  BedToIntervalList                 GatherVcfs
  BuildBamIndex                     GenotypeConcordance
  CalculateHsMetrics                IlluminaBasecallsToFastq
  CalculateReadGroupChecksum        IlluminaBasecallsToSam
  CheckIlluminaDirectory            LiftOverIntervalList
  CheckTerminatorBlock              LiftoverVcf
  CleanSam                          MakeSitesOnlyVcf
  CollectAlignmentSummaryMetrics    MarkDuplicates
  CollectBaseDistributionByCycle    MarkDuplicatesWithMateCigar
  CollectGcBiasMetrics              MarkIlluminaAdapters
  CollectHiSeqXPfFailMetrics        MeanQualityByCycle
  CollectIlluminaBasecallingMetrics MergeBamAlignment
  CollectIlluminaLaneMetrics        MergeSamFiles
  CollectInsertSizeMetrics          MergeVcfs
  CollectJumpingLibraryMetrics      NormalizeFasta
  CollectMultipleMetrics            PositionBasedDownsampleSam
  CollectOxoGMetrics                QualityScoreDistribution
  CollectQualityYieldMetrics        RenameSampleInVcf
  CollectRawWgsMetrics              ReorderSam
  CollectRnaSeqMetrics              ReplaceSamHeader
  CollectRrbsMetrics                RevertOriginalBaseQualitiesAndAddMateCigar
  CollectSequencingArtifactMetrics  RevertSam
  CollectTargetedPcrMetrics         SamFormatConverter
  CollectVariantCallingMetrics      SamToFastq
  CollectWgsMetrics                 ScatterIntervalsByNs
  CompareMetrics                    SortSam
  CompareSAMs                       SortVcf
  ConvertSequencingArtifactToOxoG   SplitSamByLibrary
  CreateSequenceDictionary          SplitVcfs
  DownsampleSam                     UpdateVcfSequenceDictionary
  EstimateLibraryComplexity         ValidateSamFile
  ExtractIlluminaBarcodes           VcfFormatConverter
  ExtractSequences                  VcfToIntervalList
  FastqToSam                        ViewSam
The package is enhanced by the following packages: multiqc
Please cite: Broad Institute: Picard toolkit. Broad Institute, GitHub repository (2019)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Sequencing; Document, record and content management
pirs
Profile based Illumina pair-end Reads Simulator
Versions of package pirs
ReleaseVersionArchitectures
buster2.0.2+dfsg-8amd64,arm64,armhf,i386
sid2.0.2+dfsg-12amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch2.0.2+dfsg-5.1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye2.0.2+dfsg-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.0.2+dfsg-11amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie2.0.2+dfsg-12amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The program pIRS can be used for simulating Illumina PE reads, with a series of characters generated by Illumina sequencing platform, such as insert size distribution, sequencing error(substitution, insertion, deletion), quality score and GC content-coverage bias.

The insert size follows a normal distribution, so users should set the mean value and standard deviation. Usually the standard deviation is set as 1/20 of the mean value. The normal distribution by Box-Muller method is simulated.

The program simulates sequencing error, quality score and GC content- coverage bias according to the empirical distribution profile. Some default profiles counted from lots of real sequencing data are provided.

To simulate reads from diploid genome, users should simulate the diploid genome sequence firstly by setting the ratio of heterozygosis SNP, heterozygosis InDel and structure variation.

Please cite: Xuesong Hu, Jianying Yuan, Yujian Shi, Jianliang Lu, Binghang Liu, Zhenyu Li, Yanxiang Chen, Desheng Mu, Hao Zhang, Nan Li, Zhen Yue, Fan Bai, Heng Li and Wei Fan: pIRS: Profile-based Illumina pair-end reads simulator. (PubMed,eprint) Bioinformatics 28(11):1533-5 (2012)
Registry entries: Bioconda 
pizzly
identification de fusions de gènes dans des données de séquençage d’ARN
Versions of package pizzly
ReleaseVersionArchitectures
trixie0.37.3+ds-9amd64,arm64,mips64el,ppc64el,riscv64,s390x
sid0.37.3+ds-9amd64,arm64,mips64el,ppc64el,riscv64,s390x
bookworm0.37.3+ds-9amd64,arm64,mips64el,ppc64el,s390x
bullseye0.37.3+ds-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

For the interpretation of the transcriptome (the abundance and sequence of RNA) of tomour cells one is particularly interested in transcripts that cannot be mapped to single genes but that are seen to be fused as parts from two genes. Likely eplanations are chromosomal translocations.

Pizzly can identify novel such peculiarities, building on interpretations on variable splicing by the tool kallisto. Both tools are elements of the bcbio workflow.

Registry entries: Bioconda 
placnet
Plasmid Constellation Network project
Versions of package placnet
ReleaseVersionArchitectures
bookworm1.04-1all
stretch1.03-2all
buster1.03-3all
trixie1.04-1all
bullseye1.03-3all
sid1.04-1all
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Placnet is a new tool for plasmid analysis in NGS projects. Placnet is optimized to work with Illumina sequences but it also works with 454, Iontorrent or any of the actual sequence technologies.

The input of placnet is a set of contigs and one or more SAM files with the mapping of the reads against the contigs. Placnet obtains a set of files, easily opened on Cytoscape software or other network tools.

Please cite: Val F. Lanza, María de Toro, M. Pilar Garcillán-Barcia, Azucena Mora, Jorge Blanco, Teresa M. Coque and Fernando de la Cruz: Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences. (PubMed,eprint) PLOS 10(12):e1004766 (2014)
poretools
toolkit for nanopore nucleotide sequencing data
Versions of package poretools
ReleaseVersionArchitectures
buster0.6.0+dfsg-3all
bullseye0.6.0+dfsg-5all
bookworm0.6.0+dfsg-6all
trixie0.6.0+dfsg-6all
sid0.6.0+dfsg-6all
stretch0.6.0+dfsg-2all
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

poretools is a flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis. Poretools operates directly on the native FAST5 (a variant of the HDF5 standard) file format produced by ONT and provides a wealth of format conversion utilities and data exploration and visualization tools.

Please cite: Nicholas Loman and Aaron Quinlan: Poretools: a toolkit for analyzing nanopore sequence data. (PubMed,eprint) Bioinformatics 30(23):3399-3401 (2014)
Registry entries: Bio.tools  Bioconda 
python3-airr
Data Representation Standard library for antibody and TCR sequences
Versions of package python3-airr
ReleaseVersionArchitectures
buster1.2.1-2all
bullseye1.3.1-1all
bookworm1.3.1-1all
trixie1.5.0-1all
sid1.5.0-1all
upstream1.5.1
Popcon: 2 users (1 upd.)*
Newer upstream!
License: DFSG free
Git

This package provides a library by the AIRR community to for describing, reporting, storing, and sharing adaptive immune receptor repertoire (AIRR) data, such as sequences of antibodies and T cell receptors (TCRs). Some specific efforts include:

  • The MiAIRR standard for describing minimal information about AIRR datasets, including sample collection and data processing information.
  • Data representations (file format) specifications for storing large amounts of annotated AIRR data.
  • APIs for exposing a common interface to repositories/databases containing AIRR data.
  • A community standard for software tools which will allow conforming tools to gain community recognition.

This package installs the library for Python 3.

python3-gffutils
Work with GFF and GTF files in a flexible database framework
Versions of package python3-gffutils
ReleaseVersionArchitectures
buster0.9-1all
bookworm0.11.1-3all
sid0.13-1all
trixie0.13-1all
bullseye0.10.1-2all
Popcon: 4 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

A Python package for working with and manipulating the GFF and GTF format files typically used for genomic annotations. Files are loaded into a sqlite3 database, allowing much more complex manipulation of hierarchical features (e.g., genes, transcripts, and exons) than is possible with plain-text methods alone.

Registry entries: Bio.tools  Bioconda 
python3-presto
toolkit for processing B and T cell sequences (Python3 module)
Versions of package python3-presto
ReleaseVersionArchitectures
bullseye0.6.2-1all
buster0.5.10-1all
sid0.7.2-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.7.2-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm0.7.1-1all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

pRESTO is a toolkit for processing raw reads from high-throughput sequencing of B cell and T cell repertoires.

Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of lymphocyte repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of B cells and T cells. The REpertoire Sequencing TOolkit (pRESTO) is composed of a suite of utilities to handle all stages of sequence processing prior to germline segment assignment. pRESTO is designed to handle either single reads or paired-end reads. It includes features for quality control, primer masking, annotation of reads with sequence embedded barcodes, generation of unique molecular identifier (UMI) consensus sequences, assembly of paired-end reads and identification of duplicate sequences. Numerous options for sequence sorting, sampling and conversion operations are also included.

This package provides the presto Python3 module.

Please cite: Jason A. Vander Heiden, Gur Yaari, Mohamed Uduman, Joel N.H. Stern, Kevin C. O’Connor, David A. Hafler, Francois Vigneault and Steven H. Kleinstein: pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. (PubMed,eprint) Bioinformatics 30(13):1930-1932 (2014)
Registry entries: Bio.tools  SciCrunch  Bioconda 
python3-pybedtools
enveloppe de Python 3 pour BEDTools pour des tâches de bio-informatique
Versions of package python3-pybedtools
ReleaseVersionArchitectures
buster0.8.0-1amd64,arm64
bookworm0.9.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
trixie0.10.0-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
sid0.10.0-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
bullseye0.8.0-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 4 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

La suite BEDTools de programmes est largement utilisée pour la manipulation d’intervalle génomique ou « l’algèbre génomique ». pybedtools enveloppe et étend BEDTools et propose des manipulations au niveau fonction à partir de Python.

Il s'agit de la version en Python⋅3.

Please cite: R. K. Dale, B. S. Pedersen and A. R. Quinlan: Pybedtools: a flexible Python library for manipulating genomic datasets and annotations". Bioinformatics 27(24):3423-3424 (2011)
Registry entries: Bio.tools  Bioconda 
python3-sqt
SeQuencing Tools for biological DNA/RNA high-throughput data
Versions of package python3-sqt
ReleaseVersionArchitectures
buster0.8.0-3amd64,arm64
bullseye0.8.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
trixie0.8.0-8amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
bookworm0.8.0-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
sid0.8.0-8amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

sqt is a collection of command-line tools for working with high-throughput sequencing data. Conceptionally not fixed to use any particular language, many sqt subcommands are currently implemented in Python. For them, a Python package is available with functions for reading and writing FASTA/FASTQ files, computing alignments, quality trimming, etc.

The following tools are offered:

  • sqt-coverage -- Compute per-reference statistics such as coverage and GC content
  • sqt-fastqmod -- FASTQ modifications: shorten, subset, reverse complement, quality trimming.
  • sqt-fastastats -- Compute N50, min/max length, GC content etc. of a FASTA file
  • sqt-qualityguess -- Guess quality encoding of one or more FASTA files.
  • sqt-globalalign -- Compute a global or semiglobal alignment of two strings.
  • sqt-chars -- Count length of the first word given on the command line.
  • sqt-sam-cscq -- Add the CS and CQ tags to a SAM file with colorspace reads.
  • sqt-fastamutate -- Add substitutions and indels to sequences in a FASTA file.
  • sqt-fastaextract -- Efficiently extract one or more regions from an indexed FASTA file.
  • sqt-translate -- Replace characters in FASTA files (like the 'tr' command).
  • sqt-sam-fixn -- Replace all non-ACGT characters within reads in a SAM file.
  • sqt-sam-insertsize -- Mean and standard deviation of paired-end insert sizes.
  • sqt-sam-set-op -- Set operations (union, intersection, ...) on SAM/BAM files.
  • sqt-bam-eof -- Check for the End-Of-File marker in compressed BAM files.
  • sqt-checkfastqpe -- Check whether two FASTQ files contain correctly paired paired-end data.
Registry entries: Bioconda 
q2cli
Click-based command line interface for QIIME 2
Versions of package q2cli
ReleaseVersionArchitectures
bullseye2020.11.1-1all
sid2024.5.0-2all
bookworm2022.11.1-2all
Popcon: 35 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

QIIME 2 is a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results. Key features:

  • Integrated and automatic tracking of data provenance
  • Semantic type system
  • Plugin system for extending microbiome analysis functionality
  • Support for multiple types of user interfaces (e.g. API, command line, graphical)

QIIME 2 is a complete redesign and rewrite of the QIIME 1 microbiome analysis pipeline. QIIME 2 will address many of the limitations of QIIME 1, while retaining the features that makes QIIME 1 a powerful and widely-used analysis pipeline.

QIIME 2 currently supports an initial end-to-end microbiome analysis pipeline. New functionality will regularly become available through QIIME 2 plugins. You can view a list of plugins that are currently available on the QIIME 2 plugin availability page. The future plugins page lists plugins that are being developed.

Please cite: Evan Bolyen, Jai Ram Rideout, Matthew R Dillon, Nicholas A Bokulich, Christian Abnet, Gabriel A Al-Ghalith, Harriet Alexander, Eric J Alm, Manimozhiyan Arumugam, Francesco Asnicar, Yang Bai, Jordan E Bisanz, Kyle Bittinger, Asker Brejnrod, Colin J Brislawn, C Titus Brown, Benjamin J Callahan, Andrés Mauricio Caraballo-Rodríguez, John Chase, Emily Cope, Ricardo Da Silva, Pieter C Dorrestein, Gavin M Douglas, Daniel M Durall, Claire Duvallet, Christian F Edwardson, Madeleine Ernst, Mehrbod Estaki, Jennifer Fouquier, Julia M Gauglitz, Deanna L Gibson, Antonio Gonzalez, Kestrel Gorlick, Jiarong Guo, Benjamin Hillmann, Susan Holmes, Hannes Holste, Curtis Huttenhower, Gavin Huttley, Stefan Janssen, Alan K Jarmusch, Lingjing Jiang, Benjamin Kaehler, Kyo Bin Kang, Christopher R Keefe, Paul Keim, Scott T Kelley, Dan Knights, Irina Koester, Tomasz Kosciolek, Jorden Kreps, Morgan GI Langille, Joslynn Lee, Ruth Ley, Yong-Xin Liu, Erikka Loftfield, Catherine Lozupone, Massoud Maher, Clarisse Marotz, Bryan D Martin, Daniel McDonald, Lauren J McIver, Alexey V Melnik, Jessica L Metcalf, Sydney C Morgan, Jamie Morton, Ahmad Turan Naimey, Jose A Navas-Molina, Louis Felix Nothias, Stephanie B Orchanian, Talima Pearson, Samuel L Peoples, Daniel Petras, Mary Lai Preuss, Elmar Pruesse, Lasse Buur Rasmussen, Adam Rivers, Michael S Robeson, Patrick Rosenthal, Nicola Segata, Michael Shaffer, Arron Shiffer, Rashmi Sinha, Se Jin Song, John R Spear, Austin D Swafford, Luke R Thompson, Pedro J Torres, Pauline Trinh, Anupriya Tripathi, Peter J Turnbaugh, Sabah Ul-Hasan, Justin JJ van der Hooft, Fernando Vargas, Yoshiki Vázquez-Baeza, Emily Vogtmann, Max von Hippel, William Walters, Yunhu Wan, Mingxun Wang, Jonathan Warren, Kyle C Weber, Chase HD Williamson, Amy D Willis, Zhenjiang Zech Xu, Jesse R Zaneveld, Yilong Zhang, Qiyun Zhu, Rob Knight and J Gregory Caporaso: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. (eprint) Nature Biotechnology 37 (2019)
qcumber
quality control of genomic sequences
Versions of package qcumber
ReleaseVersionArchitectures
buster1.0.14+dfsg-1all
bullseye2.3.0-2all
bookworm2.3.0-2all
trixie2.3.0-2all
sid2.3.0-2all
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

QCPipeline is a tool for quality control. The workflow is as follows:

 1. Quality control with FastQC
 2. Trim Reads with Trimmomatic
 3. Quality control of trimmed reads with FastQC
 4. Map reads against reference using bowtie2
 5. Classify reads with Kraken
Registry entries: Bioconda 
qiime
QIIME, Quantitative Insights Into Microbial Ecology
Versions of package qiime
ReleaseVersionArchitectures
bullseye2020.11.1-1all
bookworm2022.11.1-2all
sid2024.5.0-1all
jessie1.8.0+dfsg-4amd64,armel,armhf,i386
upstream2024.5.1
Debtags of package qiime:
roleprogram
Popcon: 28 users (2 upd.)*
Newer upstream!
License: DFSG free
Git

Les microbes sont omniprésents autour des humains, des animaux, des plantes et de tous leurs parasites, avec un forte interaction sur eux et sur leur environnement. La qualité du sol vient à l’esprit, mais aussi l’effet qu’ont les bactéries sur tous. Les humains influent sur leur abondance relative et absolue par des antibiotiques, des aliments, des engrais ou tout ce qui vient à l’esprit, et ces changements affectent tout le monde.

QIIME 2 est un paquet pour l’analyse de microbiome, extensible et décentralisée avec une attention particulière sur la transparence des données et analyses. QIIME 2 permet aux chercheurs de commencer une analyse avec des données de séquences d’ADN et de terminer avec des figures et des résultats statistiques de qualité professionnelle. Principales caractéristiques :

 — suivi intégré et automatique de la provenance des données ;
 — système de type sémantique ;
 — système de greffons pour étendre les fonctionnalités d’analyse de
   microbiome ;
 — prise en charge de plusieurs types d’interface (par exemple, API,
   ligne de commande, graphique) ;

QIIME 2 est une nouvelle conception et une réécriture de la tuyauterie d’analyse de microbiome, QIIME 1. QIIME 2 corrigera la plupart des limitations de QIIME 1, tout en conservant les fonctions qui font de QIIME 1 une tuyauterie d’analyse puissante et largement utilisée.

QIIME 2 actuellement prend en charge une première tuyauterie d’analyse de microbiome de bout en bout. De nouvelles fonctionnalités seront régulièrement disponibles à travers des greffons de QIIME 2. Une liste de greffons peut être consultée sur la page de disponibilité de greffons de QIIME 2. La page des futurs greffons liste les greffons en cours de développement.

Please cite: Evan Bolyen, Jai Ram Rideout, Matthew R Dillon, Nicholas A Bokulich, Christian Abnet, Gabriel A Al-Ghalith, Harriet Alexander, Eric J Alm, Manimozhiyan Arumugam, Francesco Asnicar, Yang Bai, Jordan E Bisanz, Kyle Bittinger, Asker Brejnrod, Colin J Brislawn, C Titus Brown, Benjamin J Callahan, Andrés Mauricio Caraballo-Rodríguez, John Chase, Emily Cope, Ricardo Da Silva, Pieter C Dorrestein, Gavin M Douglas, Daniel M Durall, Claire Duvallet, Christian F Edwardson, Madeleine Ernst, Mehrbod Estaki, Jennifer Fouquier, Julia M Gauglitz, Deanna L Gibson, Antonio Gonzalez, Kestrel Gorlick, Jiarong Guo, Benjamin Hillmann, Susan Holmes, Hannes Holste, Curtis Huttenhower, Gavin Huttley, Stefan Janssen, Alan K Jarmusch, Lingjing Jiang, Benjamin Kaehler, Kyo Bin Kang, Christopher R Keefe, Paul Keim, Scott T Kelley, Dan Knights, Irina Koester, Tomasz Kosciolek, Jorden Kreps, Morgan GI Langille, Joslynn Lee, Ruth Ley, Yong-Xin Liu, Erikka Loftfield, Catherine Lozupone, Massoud Maher, Clarisse Marotz, Bryan D Martin, Daniel McDonald, Lauren J McIver, Alexey V Melnik, Jessica L Metcalf, Sydney C Morgan, Jamie Morton, Ahmad Turan Naimey, Jose A Navas-Molina, Louis Felix Nothias, Stephanie B Orchanian, Talima Pearson, Samuel L Peoples, Daniel Petras, Mary Lai Preuss, Elmar Pruesse, Lasse Buur Rasmussen, Adam Rivers, Michael S Robeson, Patrick Rosenthal, Nicola Segata, Michael Shaffer, Arron Shiffer, Rashmi Sinha, Se Jin Song, John R Spear, Austin D Swafford, Luke R Thompson, Pedro J Torres, Pauline Trinh, Anupriya Tripathi, Peter J Turnbaugh, Sabah Ul-Hasan, Justin JJ van der Hooft, Fernando Vargas, Yoshiki Vázquez-Baeza, Emily Vogtmann, Max von Hippel, William Walters, Yunhu Wan, Mingxun Wang, Jonathan Warren, Kyle C Weber, Chase HD Williamson, Amy D Willis, Zhenjiang Zech Xu, Jesse R Zaneveld, Yilong Zhang, Qiyun Zhu, Rob Knight and J Gregory Caporaso: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. (PubMed,eprint) Nature Biotechnology 37:852 - 857 (2019)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Microbial ecology
quorum
QUality Optimized Reads of genomic sequences
Versions of package quorum
ReleaseVersionArchitectures
trixie1.1.2-2amd64,arm64,mips64el,ppc64el,riscv64
buster1.1.1-2amd64,arm64
sid1.1.2-2amd64,arm64,mips64el,ppc64el,riscv64
bookworm1.1.1-7amd64,arm64,mips64el,ppc64el
bullseye1.1.1-4amd64,arm64,mips64el,ppc64el
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

QuorUM enables to obtain trimmed and error-corrected reads that result in assemblies with longer contigs and fewer errors. QuorUM provides best performance compared to other published error correctors in several metrics. QuorUM is efficiently implemented making use of current multi- core computing architectures and it is suitable for large data sets (1 billion bases checked and corrected per day per core). The third-party assembler (SOAPdenovo) benefits significantly from using QuorUM error- corrected reads. QuorUM error corrected reads result in a factor of 1.1 to 4 improvement in N50 contig size compared to using the original reads with SOAPdenovo for the data sets investigated.

Please cite: Guillaume Marçais, James A. Yorke and Aleksey Zimin: QuorUM: An Error Corrector for Illumina Reads. (PubMed,eprint) PLoS One 10(6):e0130821 (2015)
Registry entries: SciCrunch 
r-bioc-deseq2
R package for RNA-Seq Differential Expression Analysis
Versions of package r-bioc-deseq2
ReleaseVersionArchitectures
sid1.44.0+dfsg-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
bullseye1.30.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.22.2+dfsg-1amd64,arm64,armhf,i386
bookworm1.38.3+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.14.1-1amd64,arm64,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
trixie1.44.0+dfsg-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
Popcon: 33 users (7 upd.)*
Versions and Archs
License: DFSG free
Git

Differential gene expression analysis based on the negative binomial distribution. Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Please cite: Michael I Love, Wolfgang Huber and Simon Anders: Moderated estimation of fold change and dispersion for {RNA}-seq data with {DESeq}2. (eprint) Genome Biol 15(12) (2014)
Registry entries: Bio.tools  SciCrunch  Bioconda 
r-bioc-edger
analyse empirique de données numériques d’expressions de gène avec R
Versions of package r-bioc-edger
ReleaseVersionArchitectures
bookworm3.40.2+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch3.14.0+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie3.8.2+dfsg-1amd64,armel,armhf,i386
sid4.2.1+dfsg-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
trixie4.2.1+dfsg-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
bullseye3.32.1+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 36 users (10 upd.)*
Versions and Archs
License: DFSG free
Git

Il s’agit d’un paquet de Bioconductor pour l’analyse différentielle des expressions de tout le séquençage du transcriptome (RNA-seq) et des profils numériques d’expressions de gène avec réplication biologique. Il utilise l’estimation empirique de Bayes et des tests exacts basés sur la loi binomiale négative. Il est aussi utile pour l’analyse différentielle de signaux avec d’autres types de données de dénombrement à l’échelle du génome.

Please cite: Mark D. Robinson, Davis J. McCarthy and Gordon K. Smyth: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. (PubMed,eprint) Bioinformatics 26,:139-140 (2010)
Registry entries: Bio.tools  SciCrunch  Bioconda 
r-bioc-hilbertvis
paquet de GNU R pour visualiser de grands vecteurs de données
Versions of package r-bioc-hilbertvis
ReleaseVersionArchitectures
buster1.40.0-1amd64,arm64,armhf,i386
bookworm1.56.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie1.62.0-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
sid1.62.0-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
bullseye1.48.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.32.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.24.0-1amd64,armel,armhf,i386
Debtags of package r-bioc-hilbertvis:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
useanalysing
Popcon: 4 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Cet outil permet d’afficher de très grands vecteurs de données d’une manière efficace du point de vue espace, en les organisant le long d’une courbe de Hilbert en 2D. L’utilisateur peut visuellement évaluer la structure de grande échelle et la distribution des caractéristiques simultanément avec la forme et l’intensité grossières des caractéristiques individuelles.

En bio-informatique, un cas typique d’utilisation est ChIP-Chip et ChIP- Seq, ou, fondamentalement, toutes les sortes de donnée génomique affichées conventionnellement sous forme de traces quantitatives (« données wiggle ») dans les navigateurs de génomes tels que ceux fournis par Ensembl ou UCSC.

Please cite: Simon Anders: Visualization of genomic data with the Hilbert curve. (PubMed,eprint) Bioinformatics 25(10):1231-1235 (2009)
Registry entries: Bio.tools  SciCrunch  Bioconda 
r-bioc-metagenomeseq
GNU R statistical analysis for sparse high-throughput sequencing
Versions of package r-bioc-metagenomeseq
ReleaseVersionArchitectures
bookworm1.40.0-1all
stretch1.16.0-2all
trixie1.46.0-1all
sid1.46.0-1all
bullseye1.32.0-1all
buster1.24.1-1all
Popcon: 3 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

MetagenomeSeq is designed to determine features (be it Operational Taxanomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups of multiple samples. metagenomeSeq is designed to address the effects of both normalization and under-sampling of microbial communities on disease association detection and the testing of feature correlations.

Registry entries: Bio.tools  Bioconda 
r-bioc-rsubread
Subread Sequence Alignment and Counting for R
Versions of package r-bioc-rsubread
ReleaseVersionArchitectures
trixie2.18.0-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
sid2.18.0-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
bookworm2.12.2-1amd64,arm64,mips64el,ppc64el,s390x
bullseye2.4.2-1amd64,arm64,mips64el,ppc64el,s390x
Popcon: 20 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Alignment, quantification and analysis of second and third generation sequencing data. Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery.

Can be applied to all major sequencing techologies and to both short and long sequence reads.

Please cite: Yang Liao, Gordon K Smyth and Wei Shi: The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads,. (eprint) Nucleic Acids Research 47(8):e47 (2019)
Registry entries: Bio.tools  Bioconda 
r-cran-alakazam
lignage clonal d’immunoglobuline et analyse de diversité
Versions of package r-cran-alakazam
ReleaseVersionArchitectures
buster0.2.11-1amd64,arm64,armhf,i386
bullseye1.1.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
experimental1.3.0-2~0exp0amd64,arm64,mips64el,ppc64el,riscv64,s390x
sid1.3.0-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
trixie1.3.0-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
bookworm1.2.1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 5 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Alakazam is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides a set of tools to investigate lymphocyte receptor clonal lineages, diversity, gene usage, and other repertoire level properties, with a focus on high-throughput immunoglobulin (Ig) sequencing.

Alakazam serves five main purposes:

  • Providing core functionality for other R packages in the Immcantation framework. This includes common tasks such as file I/O, basic DNA sequence manipulation, and interacting with V(D)J segment and gene annotations.
  • Providing an R interface for interacting with the output of the pRESTO and Change-O tool suites.
  • Performing lineage reconstruction on clonal populations of Ig sequences and analyzing the topology of the resultant lineage trees.
  • Performing clonal abundance and diversity analysis on lymphocyte repertoires.
  • Performing physicochemical property analyses of lymphocyte receptor sequences.
Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. (eprint) 31(20):3356–3358 (2017)
r-cran-shazam
Immunoglobulin Somatic Hypermutation Analysis
Versions of package r-cran-shazam
ReleaseVersionArchitectures
sid1.2.0-1all
bookworm1.1.2-1all
buster0.1.11-1all
bullseye1.0.2-1all
trixie1.2.0-1all
Popcon: 5 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Provides a computational framework for Bayesian estimation of antigen-driven selection in immunoglobulin (Ig) sequences, providing an intuitive means of analyzing selection by quantifying the degree of selective pressure. Also provides tools to profile mutations in Ig sequences, build models of somatic hypermutation (SHM) in Ig sequences, and make model-dependent distance comparisons of Ig repertoires.

SHazaM is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides tools for advanced analysis of somatic hypermutation (SHM) in immunoglobulin (Ig) sequences. Shazam focuses on the following analysis topics:

  • Quantification of mutational load SHazaM includes methods for determine the rate of observed and expected mutations under various criteria. Mutational profiling criteria include rates under SHM targeting models, mutations specific to CDR and FWR regions, and physicochemical property dependent substitution rates.
  • Statistical models of SHM targeting patterns Models of SHM may be divided into two independent components: 1) a mutability model that defines where mutations occur and 2) a nucleotide substitution model that defines the resulting mutation. Collectively these two components define an SHM targeting model. SHazaM provides empirically derived SHM 5-mer context mutation models for both humans and mice, as well tools to build SHM targeting models from data.
  • Analysis of selection pressure using BASELINe The Bayesian Estimation of Antigen-driven Selection in Ig Sequences (BASELINe) method is a novel method for quantifying antigen-driven selection in high-throughput Ig sequence data. BASELINe uses SHM targeting models can be used to estimate the null distribution of expected mutation frequencies, and provide measures of selection pressure informed by known AID targeting biases.
  • Model-dependent distance calculations SHazaM provides methods to compute evolutionary distances between sequences or set of sequences based on SHM targeting models. This information is particularly useful in understanding and defining clonal relationships.
Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data.. (PubMed,eprint) Bioinformatics 31(20):3356-3358 (2015)
Registry entries: Bioconda 
r-cran-tcr
Advanced Data Analysis of Immune Receptor Repertoires
Versions of package r-cran-tcr
ReleaseVersionArchitectures
bookworm2.3.2+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie2.3.2+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid2.3.2+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye2.3.2+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster2.2.3-1amd64,arm64,armhf,i386
Popcon: 5 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Cells of the immune system are the grand exception to the rule that all cells of an individuum have (mostly exact) copies of the same DNA. B cells (which produce antibodies) and T cells (which communicate with cells) however have a section of their DNA with genes of the groups V, D and J that are reorganised within the genomic DNA to provide the flexibility to deal with yet unknown pathogens.

This package provides a platform for the advanced analysis of T cell receptor repertoire data and its visualisations.

Caveat: This package is soon to be replaced by http://github.com/immunomind/immunarch which is not yet available as a Debian package.

Please cite: Vadim I. Nazarov, Mikhail V. Pogorelyy, Ekaterina A. Komech, Ivan V. Zvyagin, Dmitry A. Bolotin, Mikhail Shugay, Dmitry M. Chudakov, Yury B. Lebedev and Ilgar Z. Mamedov: tcR: an R package for T cell receptor repertoire advanced data analysis. (eprint) BMC Bioinformatics 16:175 (2015)
Registry entries: Bio.tools  Bioconda 
r-cran-tigger
Infers new Immunoglobulin alleles from Rep-Seq Data
Versions of package r-cran-tigger
ReleaseVersionArchitectures
buster0.3.1-1all
bookworm1.0.1-1all
trixie1.1.0-1all
sid1.1.0-1all
bullseye1.0.0-1all
Popcon: 3 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

Summary: Infers the V genotype of an individual from immunoglobulin (Ig) repertoire-sequencing (Rep-Seq) data, including detection of any novel alleles. This information is then used to correct existing V allele calls from among the sample sequences.

High-throughput sequencing of B cell immunoglobulin receptors is providing unprecedented insight into adaptive immunity. A key step in analyzing these data involves assignment of the germline V, D and J gene segment alleles that comprise each immunoglobulin sequence by matching them against a database of known V(D)J alleles. However, this process will fail for sequences that utilize previously undetected alleles, whose frequency in the population is unclear.

TIgGER is a computational method that significantly improves V(D)J allele assignments by first determining the complete set of gene segments carried by an individual (including novel alleles) from V(D)J-rearrange sequences. TIgGER can then infer a subject’s genotype from these sequences, and use this genotype to correct the initial V(D)J allele assignments.

The application of TIgGER continues to identify a surprisingly high frequency of novel alleles in humans, highlighting the critical need for this approach. TIgGER, however, can and has been used with data from other species.

Core Abilities:

  • Detecting novel alleles
  • Inferring a subject’s genotype
  • Correcting preliminary allele calls

Required Input

  • A table of sequences from a single individual, with columns containing the following:
  • V(D)J-rearranged nucleotide sequence (in IMGT-gapped format)
  • Preliminary V allele calls
  • Preliminary J allele calls
  • Length of the junction region
  • Germline Ig sequences in IMGT-gapped fasta format (e.g., as those downloaded from IMGT/GENE-DB)

The former can be created through the use of IMGT/HighV-QUEST and Change-O.

Please cite: Namita T. Gupta, Jason A. Vander Heiden, Mohamed Uduman, Daniel Gadala-Maria, Gur Yaari and Steven H. Kleinstein: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. (eprint) 31(20):3356–3358 (2017)
Registry entries: Bioconda 
rna-star
aligneur universel et ultra-rapide de RNA-seq
Versions of package rna-star
ReleaseVersionArchitectures
sid2.7.11b+dfsg-1amd64,arm64,mips64el,ppc64el
bookworm2.7.10b+dfsg-2amd64,arm64,mips64el,ppc64el
trixie2.7.11b+dfsg-1amd64,arm64,mips64el,ppc64el
stretch2.5.2b+dfsg-1amd64,arm64,mips64el,ppc64el
stretch-backports2.7.0a+dfsg-1~bpo9+1amd64,arm64,mips64el,ppc64el
bullseye2.7.8a+dfsg-2amd64,arm64,mips64el,ppc64el
buster2.7.0a+dfsg-1amd64,arm64
Popcon: 6 users (5 upd.)*
Versions and Archs
License: DFSG free
Git

Il s’agit du logiciel STAR (Spliced Transcripts Alignment to a Reference) basé sur un algorithme, non décrit auparavant, d'alignement de RNA-seq qui utilise la recherche séquentielle de « graines  » pour une cartographie maximale dans des tableaux de suffixes non compressés suivis par le regroupement de graines (seed clustering) et la procédure de ligature (stitching). STAR surpasse les autres aligneurs d’un facteur supérieur à cinquante dans la vitesse de cartographie, alignant pour le génome humain 550 millions de lectures « paired-end » 2 × 76 pb par heure sur un modeste serveur de douze cœurs, tout en améliorant la sensibilité et la précision. En plus de la détection de novo neutre de jonctions canoniques, STAR peut découvrir les transcriptions de ligature non canonique et de fusion, et il peut aussi réaliser une cartographie des séquences d’ARN complètes. En utilisant le séquençage Roche 454 d’amplicons de réaction en chaîne par polymérase et transcription inverse, les auteurs ont validé expérimentalement 1960 nouvelles jonctions de ligature intergénique avec un taux de succès de 80- 90 %, corroborant la précision élevée de la stratégie de cartographie de STAR.

The package is enhanced by the following packages: multiqc
Please cite: Alexander Dobin, Carrie A. Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson and Thomas R. Gingeras: STAR: ultrafast universal RNA-seq aligner. (PubMed,eprint) Bioinformatics 29(1):15-21 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Sequence analysis
rtax
classification de lectures de séquence d’ARN ribosomique 16S
Versions of package rtax
ReleaseVersionArchitectures
jessie0.984-2all
bullseye0.984-7all
bookworm0.984-8all
sid0.984-8all
trixie0.984-8all
buster0.984-6all
stretch0.984-5all
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Les technologies de lectures courtes pour le profilage de communautés bactériennes sont de plus en plus populaires, tandis que les techniques précédentes d’assignation taxinomique par lectures d’extrémités par paires ne fonctionnent pas très bien. RTAX fournit rapidement des assignations taxinomiques de lectures d’extrémités par paires en utilisant un algorithme de consensus.

Please cite: David A. W. Soergel, Neelendu Dey, Rob Knight and Steven E. Brenner: Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. (PubMed,eprint) The ISME Journal 6:1440–1444 (2012)
salmon
wicked-fast transcript quantification from RNA-seq data
Versions of package salmon
ReleaseVersionArchitectures
bookworm1.10.1+ds1-1amd64,arm64
sid1.10.2+ds1-1amd64,arm64
trixie1.10.2+ds1-1amd64,arm64
buster0.12.0+ds1-1amd64
bullseye1.4.0+ds1-1amd64,arm64
stretch0.7.2+ds1-2amd64
upstream1.10.3
Popcon: 3 users (9 upd.)*
Newer upstream!
License: DFSG free
Git

Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data. Salmon achieves is accuracy and speed via a number of different innovations, including the use of lightweight alignments (accurate but fast-to-compute proxies for traditional read alignments) and massively-parallel stochastic collapsed variational inference. The result is a versatile tool that fits nicely into many different pipelines. For example, you can choose to make use of the lightweight alignments by providing Salmon with raw sequencing reads, or, if it is more convenient, you can provide Salmon with regular alignments (e.g. computed with your favorite aligner), and it will use the same wicked-fast, state-of-the-art inference algorithm to estimate transcript-level abundances for your experiment.

The package is enhanced by the following packages: multiqc
Please cite: Rob Patro, Geet Duggal, Michael I Love, Rafael A Irizarry and Carl Kingsford: Salmon provides fast and bias-aware quantification of transcript expression. (eprint) Nature Methods 14(4):417-419 (2017)
Registry entries: Bio.tools  SciCrunch  Bioconda 
sambamba
tools for working with SAM/BAM data
Versions of package sambamba
ReleaseVersionArchitectures
bookworm1.0+dfsg-1amd64,arm64
bullseye0.8.0-1amd64,arm64
sid1.0.1+dfsg-2amd64,arm64,riscv64
trixie1.0.1+dfsg-2amd64,arm64,riscv64
Popcon: 4 users (11 upd.)*
Versions and Archs
License: DFSG free
Git

Sambamba positions itself as a performant alternative to samtools and provides tools for

  • Powerful filtering with sambamba view --filter
  • Picard-like SAM header merging in the merge tool
  • Optional for operations on whole BAMs
  • Fast copying of a region to a new file with the slice tool
  • Duplicate marking/removal, using the Picard criteria
Please cite: Artem Tarasov, Albert J. Vilella, Edwin Cuppen, Isaac J. Nijman and Pjotr Prins: Sambamba: fast processing of NGS alignment formats. (PubMed,eprint) Bioinformatics 31(12):2032-2034 (2015)
Registry entries: Bio.tools  Bioconda 
samblaster
marks duplicates, extracts discordant/split reads
Versions of package samblaster
ReleaseVersionArchitectures
trixie0.1.26-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid0.1.26-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster0.1.24-2amd64,arm64,armhf,i386
bullseye0.1.26-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm0.1.26-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Current "next-generation" sequencing technologies cannot tell what exact sequence they will be reading. They take what is available. And if some sequences are read very often, then this needs some extra biomedical thinking. The genome could for instance be duplicated.

samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.

The package is enhanced by the following packages: multiqc
Please cite: Gregory G. Faust and Ira M. Hall: SAMBLASTER: fast duplicate marking and structural variant read extraction. (PubMed,eprint) Bioinformatics 30(17):2503-2505 (2014)
Registry entries: Bio.tools  SciCrunch  Bioconda 
samtools
traitement d'alignements de séquence pour les formats SAM et BAM et CRAM
Versions of package samtools
ReleaseVersionArchitectures
sid1.20-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye1.11-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie0.1.19-1amd64,armhf,i386
buster1.9-4amd64,arm64,armhf
stretch1.3.1-3amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
bookworm1.16.1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie1.20-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch-backports1.7-2~bpo9+1amd64,arm64,armel,armhf,mips,mips64el,mipsel,ppc64el,s390x
upstream1.21
Debtags of package samtools:
fieldbiology
interfacecommandline
networkclient
roleprogram
scopeutility
uitoolkitncurses
useanalysing, calculating, filtering
works-withbiological-sequence
Popcon: 70 users (20 upd.)*
Newer upstream!
License: DFSG free
Git

Samtools est un ensemble d'utilitaires qui manipulent les alignements de séquence de nucléotides dans le format binaire BAM. Il est capable d'importer et d'exporter à partir des formats ASCII SAM (Sequence Alignment/Map) et CRAM, de trier, de fusionner, d'indexer et de récupérer des enregistrements dans n'importe quelle région facilement. Il est conçu pour travailler sur un flux de données et est capable d'ouvrir un fichier BAM ou CRAM (mais pas SAM) sur un serveur HTTP ou FTP distant.

The package is enhanced by the following packages: libbio-samtools-perl multiqc
Please cite: Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, Richard Durbin and 1000 Genome Project Data Processing Subgroup: The Sequence Alignment/Map (SAM) Format and SAMtools. (PubMed,eprint) Bioinformatics 25(16):2078-2079 (2009)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Screenshots of package samtools
scoary
pangenome-wide association studies
Versions of package scoary
ReleaseVersionArchitectures
sid1.6.16-8all
stretch-backports1.6.16-1~bpo9+1all
buster1.6.16-1all
bullseye1.6.16-2all
bookworm1.6.16-5all
trixie1.6.16-8all
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Scoary is designed to take the gene_presence_absence.csv file from Roary as well as a traits file created by the user and calculate the associations between all genes in the accessory genome and the traits. It reports a list of genes sorted by strength of association per trait.

Please cite: Ola Brynildsrud, Jon Bohlin, Lonneke Scheffer and Vegard Eldholm: Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. (PubMed,eprint) Genome Biology 17(238) (2016)
Registry entries: Bio.tools  Bioconda 
scythe
élagage bayésien d’adaptateurs pour des lectures de séquence
Versions of package scythe
ReleaseVersionArchitectures
bookworm0.994+git20141017.20d3cff-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster0.994+git20141017.20d3cff-1amd64,arm64,armhf,i386
sid0.994+git20141017.20d3cff-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie0.994+git20141017.20d3cff-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye0.994+git20141017.20d3cff-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.994-4amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Scythe utilise une approche bayésienne naïve pour classifier des sous-chaines contaminantes dans les lectures de séquences. Il considère la qualité de l’information, ce qui le rend robuste en retirant les adaptateurs d’extrémité 3', ce qui inclut souvent des bases de pauvre qualité.

Registry entries: SciCrunch 
seqprep
stripping adaptors and/or merging paired reads of DNA sequences with overlap
Versions of package seqprep
ReleaseVersionArchitectures
buster1.3.2-3amd64,arm64,armhf,i386
stretch1.3.2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid1.3.2-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.3.2-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.3.2-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.3.2-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read. It may also just be used for its adapter trimming feature without doing any paired end overlap. When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged. When reads do not have adapter sequence they must be treated with care when doing the merging, so a much more specific approach is taken. The default parameters were chosen with specificity in mind, so that they could be ran on libraries where very few reads are expected to overlap. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap.

Registry entries: Bio.tools  SciCrunch  Bioconda 
seqtk
Fast and lightweight tool for processing sequences in the FASTA or FASTQ format
Versions of package seqtk
ReleaseVersionArchitectures
stretch1.2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.0-1amd64,armel,armhf,i386
trixie1.4-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster1.3-1amd64,arm64,armhf,i386
bookworm1.3-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.4-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye1.3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 7 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Currently, seqtk supports quality based trimming with the phred algorithm, converting fastq to fasta, reverse complementing sequences, extracting or masking subsequences in regions given in a BED/name list file, and more. It contains a subsampling module to sample exactly n sequences or a fraction of sequences.

Seqtk supports both fasta and fastq input files, which can be optionally gzip compressed.

Registry entries: Bio.tools  Bioconda 
Screenshots of package seqtk
sga
de novo genome assembler that uses string graphs
Versions of package sga
ReleaseVersionArchitectures
trixie0.10.15-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
bullseye0.10.15-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
sid0.10.15-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
buster0.10.15-4amd64,arm64
stretch0.10.15-2amd64,arm64,mips64el,ppc64el
bookworm0.10.15-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.

SGA is a de novo assembler for DNA sequence reads. It is based on Gene Myers' string graph formulation of assembly and uses the FM-index/Burrows-Wheeler transform to efficiently find overlaps between sequence reads.

Please cite: Jared T. Simpson and Richard Durbin: Efficient de novo assembly of large genomes using compressed data structures.. (PubMed,eprint) Genome Res 22(3):549-555 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
sickle
outil de réduction adaptative à des fenêtres pour des fichiers FASTQ utilisant la qualité
Versions of package sickle
ReleaseVersionArchitectures
trixie1.33+git20150314.f3d6ae3-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster1.33+git20150314.f3d6ae3-1amd64,arm64,armhf,i386
bookworm1.33+git20150314.f3d6ae3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.33+git20150314.f3d6ae3-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch1.33-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.33+git20150314.f3d6ae3-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

La plupart des technologies de séquençage produisent des lectures dont la qualité diminue vers l’extrémité 3'. Les bases appelées incorrectement ont un impact négatif sur les assemblages, les correspondances et les analyses bioinformatiques en aval.

Sickle est un outil qui utilise des fenêtres coulissantes selon des seuils de qualité et de longueur pour déterminer quand la qualité est suffisamment basse pour couper l’extrémité 3' des lectures. Il supprime aussi des lectures en se basant sur le seuil de longueur. Il prend les valeurs de qualité et fait coulisser une fenêtre parmi celles dont la longueur est 0,1 fois la longueur de la lecture. Si la longueur est inférieure à 1, alors la fenêtre est définie pour être égale à la longueur de la lecture. Sinon, la fenêtre coulisse le long des valeurs de qualité jusqu’à ce que la qualité moyenne descend en dessous du seuil. À ce point, l’algorithme détermine où dans la fenêtre la baisse se produit et coupe à cet endroit les chaines de lecture et de qualité. Cependant, si le point de coupure est inférieur au seuil de longueur minimale alors la lecture est entièrement rejetée.

Sickle prend en charge quatre type de valeurs de qualité : Illumina, Solexa, Phred et Sanger. Remarquez que le réglage de qualité Solexa est une approximation (la conversion réelle est une transformation non linéaire). L’approximation d’extrémité est bonne.

Sickle prend aussi en charge des fichiers d’entrée compressés.

The package is enhanced by the following packages: multiqc
Registry entries: Bio.tools  SciCrunch  Bioconda 
smalt
Sequence Mapping and Alignment Tool
Versions of package smalt
ReleaseVersionArchitectures
buster0.7.6-8amd64,arm64,armhf
bullseye0.7.6-9amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie0.7.6-4amd64,armhf,i386
bookworm0.7.6-12amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.7.6-13amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch0.7.6-6amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
trixie0.7.6-13amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

SMALT efficiently aligns DNA sequencing reads with a reference genome. Reads from a wide range of sequencing platforms, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger, can be processed including paired reads.

The software employs a perfect hash index of short words (< 20 nucleotides long), sampled at equidistant steps along the genomic reference sequences.

For each read, potentially matching segments in the reference are identified from seed matches in the index and subsequently aligned with the read using a banded Smith-Waterman algorithm.

The best gapped alignments of each read is reported including a score for the reliability of the best mapping. The user can adjust the trade-off between sensitivity and speed by tuning the length and spacing of the hashed words.

A mode for the detection of split (chimeric) reads is provided. Multi-threaded program execution is supported.

Registry entries: Bio.tools  SciCrunch  Bioconda 
Remark of Debian Med team: This can be regarded as successor of ssaha2

This program is from the same author as ssaha2 and according to its author faster and more precise than ssaha2 (except for sequences > 2000bp).

smrtanalysis
software suite for single molecule, real-time sequencing
Versions of package smrtanalysis
ReleaseVersionArchitectures
bullseye0~20210111all
bookworm0~20210112all
stretch0~20161126all
sid0~20210112all
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

SMRT® Analysis is a powerful, open-source bioinformatics software suite available for analysis of DNA sequencing data from Pacific Biosciences’ SMRT technology. Users can choose from a variety of analysis protocols that utilize PacBio® and third-party tools. Analysis protocols include de novo genome assembly, cDNA mapping, DNA base-modification detection, and long-amplicon analysis to determine phased consensus sequences.

This is a metapackage that depends on the components of SMRT Analysis.

Registry entries: Bio.tools  SciCrunch 
snap-aligner
Scalable Nucleotide Alignment Program
Versions of package snap-aligner
ReleaseVersionArchitectures
buster1.0~beta.18+dfsg-3amd64,arm64
trixie2.0.3+dfsg-2amd64,arm64,mips64el,ppc64el,riscv64
sid2.0.3+dfsg-2amd64,arm64,mips64el,ppc64el,riscv64
bullseye1.0.0+dfsg-2amd64,arm64,mips64el,ppc64el
stretch1.0~beta.18+dfsg-1amd64,arm64,mips64el,ppc64el
bookworm2.0.2+dfsg-1amd64,arm64,mips64el,ppc64el
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

SNAP is a new sequence aligner that is 3-20x faster and just as accurate as existing tools like BWA-mem, Bowtie2 and Novoalign. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools (in some cases) and lets it match larger mutations that they may miss. SNAP also natively reads BAM, FASTQ, or gzipped FASTQ, and natively writes SAM or BAM, with built-in sorting, duplicate marking, and BAM indexing.

Please cite: Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp and Taylor Sittler: Faster and More Accurate Sequence Alignment with SNAP. (eprint) arXiv preprint arXiv:1111.5572 (2011)
Registry entries: SciCrunch 
sniffles
structural variation caller using third-generation sequencing
Versions of package sniffles
ReleaseVersionArchitectures
bookworm2.0.7-1all
buster1.0.11+ds-1amd64,arm64,armhf,i386
stretch1.0.2+ds-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
sid2.2-1all
bullseye1.0.12b+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie2.2-1all
upstream2.4
Popcon: 1 users (1 upd.)*
Newer upstream!
License: DFSG free
Git

Sniffles is a structural variation (SV) caller using third-generation sequencing data such as those from Pacific Biosciences or Oxford Nanopore platforms. It detects all types of SVs using evidence from split-read alignments, high-mismatch regions, and coverage analysis.

Please cite: Fritz J. Sedlazeck, Philipp Rescheneder, Moritz Smolka, Han Fang, Maria Nattestad, Arndt von Haeseler and Michael Schatz: Accurate detection of complex structural variations using single molecule sequencing. (eprint) bioRxiv (2017)
Registry entries: Bio.tools  Bioconda 
snp-sites
code binaire pour le paquet snp-sites
Versions of package snp-sites
ReleaseVersionArchitectures
bullseye2.5.1-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.5.1-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie1.5.0-1amd64,armel,armhf,i386
sid2.5.1-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie2.5.1-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster2.4.1-1amd64,arm64,armhf,i386
stretch2.3.2-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Ce programme découvre les positions de polymorphisme d'un seul nucléotide (SNP) dans les fichiers d’entrée au format multi-fasta (pouvant être compressés). Sa sortie peut être dans divers formats largement utilisés (Multi Fasta Alignment, Vcf, phylip).

Ce logiciel a été développé à l’institut Wellcome Trust Sanger.

Un polymorphisme d’un seul nucléotide (SNP, prononcé snip, pluriel snips) est une variation de séquence d’ADN se produisant lorsque un seul nucléotide — A, T, C ou G — dans le génome (ou une autre séquence partagée) diffère entre membres d’une espèce biologique ou de paire de chromosomes. Par exemple, deux fragments d’ADN séquencés de deux individus différents, AAGCCTA à AAGCTTA, contiennent une différence dans un seul nucléotide. Dans ce cas, il y a deux allèles. La plupart des SNP communs ont seulement deux allèles.

Please cite: Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane and Simon R. Harris: SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. (eprint) Microbial Genomics 2(4) (2016)
Topics: Genetic variation
Screenshots of package snp-sites
snpomatic
logiciel de mappage strict et rapide de « lectures courtes »
Versions of package snpomatic
ReleaseVersionArchitectures
sid1.0-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.0-7amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch1.0-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0-4amd64,arm64,armhf,i386
bullseye1.0-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm1.0-6amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Les technologies de séquençage à haut débit génèrent de grandes quantités de courtes lectures. Leur mappage vers une séquence de référence consomme de grandes quantités de temps de processeur et de mémoire et les erreurs de mappage de lectures peuvent conduire à des alignements bruités ou incorrects.

SNP-o-matic est un logiciel de mappage strict de « lectures courtes ». Il gère un grand nombre de types et de formats de sortie pour des utilisations dans le filtrage de lectures, les alignements, les appels de création de génotypes basés sur les séquences, le réassemblage assisté de contigs, etc.

Please cite: Heinrich Magnus Manske and Dominic P. Kwiatkowski: SNP-o-matic. (PubMed,eprint) Bioinformatics 25(18):2434-2435 (2009)
Registry entries: Bio.tools  Bioconda 
Topics: Genetic variation; Mapping
soapdenovo
méthode d'assemblage de lectures courtes pour construire une ébauche d’assemblage de novo
Versions of package soapdenovo
ReleaseVersionArchitectures
bookworm1.05-6amd64
sid1.05-6amd64
trixie1.05-6amd64
jessie1.05-2amd64
stretch1.05-3amd64
buster1.05-5amd64
bullseye1.05-6amd64
Popcon: 2 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Le logiciel SOAPdenovo est une nouvelle méthode d'assemblage à lectures courtes qui peut construire une ébauche d’assemblage de novo pour les génomes de taille humaine. Le logiciel est spécialement conçu pour assembler les lectures courtes de la machine Genome Analyzer de l'entreprise Illumina.

Il crée de nouvelles opportunités pour la construction de séquences de référence et la réalisation d’analyses précises de génomes inexplorés de manière économique.

Cette version n’est plus entretenue, soapdenovo2 est à envisager.

Please cite: Ruiqiang Li, Hongmei Zhu, Jue Ruan, Wubin Qian, Xiaodong Fang, Zhongbin Shi, Yingrui Li, Shengting Li, Gao Shan, Karsten Kristiansen, Songgang Li, Huanming Yang, Jian Wang and Jun Wang: De novo assembly of human genomes with massively parallel short read sequencing. (PubMed,eprint) Genome Research 20(2):265-72 (2009)
Registry entries: Bio.tools  SciCrunch 
soapdenovo2
méthode d'assemblage de lectures courtes pour construire un assemblage brouillon de novo
Versions of package soapdenovo2
ReleaseVersionArchitectures
buster241+dfsg-3amd64
sid242+dfsg-4amd64
trixie242+dfsg-4amd64
jessie240+dfsg-2amd64
stretch240+dfsg1-2amd64
bookworm242+dfsg-3amd64
bullseye242+dfsg-1amd64
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Le logiciel SOAPdenovo est une nouvelle méthode d'assemblage à lectures courtes qui peut construire un assemblage brouillon de novo pour les génomes de taille humaine. Le logiciel est spécialement conçu pour assembler les lectures courtes de la machine Genome Analyzer IIx de l'entreprise Illumina.

Il crée de nouvelles opportunités pour la construction de séquences de référence et la réalisation des analyses précises des génomes inexplorés de manière économique.

Please cite: Ruibang Luo, Binghang Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiaoqian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam and Jun Wang: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Giga Science 1(1):18 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
sortmerna
tool for filtering, mapping and OTU-picking NGS reads
Versions of package sortmerna
ReleaseVersionArchitectures
bookworm4.3.6-2amd64,i386
bullseye2.1-5amd64,i386
buster2.1-3amd64,i386
stretch2.1-1amd64,i386
sid4.3.7-1amd64,i386
trixie4.3.7-1amd64,i386
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads. The core algorithm is based on approximate seeds and allows for fast and sensitive analyses of nucleotide sequences. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. Additional applications include OTU-picking and taxonomy assignation available through QIIME v1.9+ (http://qiime.org - v1.9.0-rc1). SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart rRNA and rejected reads into two files specified by the user. Optionally, it can provide high quality local alignments of rRNA reads against the rRNA database. SortMeRNA works with Illumina, 454, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

The package is enhanced by the following packages: multiqc
Please cite: Evguenia Kopylova, Laurent Noé and Hélène Touzet: SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data". (PubMed,eprint) Bioinformatics 28(24):3211-3217 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
spades
assembleur génomique pour des ensembles de données de « single-cell » et « isolates »
Versions of package spades
ReleaseVersionArchitectures
stretch-backports3.12.0+dfsg-1~bpo9+1amd64
sid3.15.5+dfsg-7amd64
trixie3.15.5+dfsg-7amd64
bookworm3.15.5+dfsg-2amd64
bullseye3.13.1+dfsg-2amd64
stretch-backports-sloppy3.13.1+dfsg-2~bpo9+1amd64
buster3.13.0+dfsg2-2amd64
stretch3.9.1+dfsg-1amd64
experimental4.0.0+dfsg1-1amd64
upstream4.0.0
Popcon: 5 users (2 upd.)*
Newer upstream!
License: DFSG free
Git

SPAdes (assembleur de génome de Saint-Pétersbourg) est conçu pour à la fois les assemblages de MDA bactériens isolés standard et ceux de cellule unique. Il fonctionne avec les lectures d’Illumina ou IonTorrent et peut fournir des assemblages hybrides en utilisant des lectures de PacBio ou Sanger. Des contigs supplémentaires peuvent être fournis pour être utilisés comme lectures longues.

Ce paquet fournit aussi les tuyauteries supplémentaires suivantes :

 – metaSPAdes, tuyauterie pour des ensembles de données métagénomiques ;
 – plasmidSPAdes, tuyauterie pour l’extraction et l’assemblage de
   plasmides à partir d’ensembles de données WGS ;
 – metaplasmidSPAdes, tuyauterie pour l’extraction et l’assemblage de
   plasmides à partir d’ensembles de données métagénomiques ;
 – rnaSPAdes, assembleur de novo de transcriptome à partir le données de
   séquençages d’ARN ;
 – truSPAdes, module pour l’assemblage de codages à barres TruSeq ;
 – biosyntheticSPAdes, module pour l’assemblage de groupes de gènes
   biosynthétique avec des lectures appariés.

SPAdes fournit plusieurs exécutables autonomes avec une interface en ligne de commande relativement simple : comptage k-mer (spades-kmercounter), construction de graphe d’assemblages (spades-gbuilder) et lectures longues vers un alignement de graphe (spades-gmapper).

Please cite: Anton Bankevich, Sergey Nurk, Dmitry Antipov, Alexey A. Gurevich, Mikhail Dvorkin, Alexander S. Kulikov, Valery M. Lesin, Sergey I. Nikolenko, Son Pham, Andrey D. Prjibelski, Alexey V. Pyshkin, Alexander V. Sirotkin, Nikolay Vyahhi, Glenn Tesler, Max A. Alekseyev and Pavel A. Pevzner: SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. (PubMed,eprint) Journal of Computational Biology 19(5):455-477 (2012)
Registry entries: Bio.tools  SciCrunch  Bioconda 
sprai
single-pass sequencing read accuracy improver
Versions of package sprai
ReleaseVersionArchitectures
bullseye0.9.9.23+dfsg1-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm0.9.9.23+dfsg1-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch0.9.9.22+dfsg-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
trixie0.9.9.23+dfsg1-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid0.9.9.23+dfsg1-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster0.9.9.23+dfsg-2amd64,arm64,armhf,i386
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Sprai is a tool to correct sequencing errors in single-pass reads for de novo assembly. It is originally designed for correcting sequencing errors in single-molecule DNA sequencing reads, especially in Continuous Long Reads (CLRs) generated by PacBio RS sequencers. The goal of Sprai is not maximizing the accuracy of error-corrected reads. Instead, Sprai aims at maximizing the continuity (i.e., N50 contig length) of assembled contigs after error correction.

sra-toolkit
utilities for the NCBI Sequence Read Archive
Versions of package sra-toolkit
ReleaseVersionArchitectures
experimental3.0.9+dfsg-6amd64,arm64
sid3.0.3+dfsg-9amd64,arm64
trixie3.0.3+dfsg-9amd64,arm64
jessie2.3.5-2+dfsg-1amd64,i386
bullseye2.10.9+dfsg-2amd64
buster2.9.3+dfsg-1amd64
bookworm3.0.3+dfsg-6~deb12u1amd64,arm64
stretch2.8.1-2+dfsg-2amd64,i386
upstream3.1.1
Popcon: 9 users (1 upd.)*
Newer upstream!
License: DFSG free
Git

Tools for reading the SRA archive, generally by converting individual runs into some commonly used format such as fastq.

The textual dumpers "sra-dump" and "vdb-dump" are provided in this release as an aid in visual inspection. It is likely that their actual output formatting will be changed in the near future to a stricter, more formalized representation[s]. PLEASE DO NOT RELY UPON THE OUTPUT FORMAT SEEN IN THIS RELEASE.

Other tools distributed in this package are:

 abi-dump, abi-load
 align-info
 bam-load
 cache-mgr
 cg-load
 copycat
 fasterq-dump
 fastq-dump, fastq-load
 helicos-load
 illumina-dump, illumina-load
 kar
 kdbmeta
 latf-load
 pacbio-load
 prefetch
 rcexplain
 remote-fuser
 sff-dump, sff-load
 sra-pileup, sra-sort, sra-stat, srapath
 srf-load
 test-sra
 vdb-config, vdb-copy, vdb-decrypt, vdb-encrypt, vdb-get, vdb-lock,
 vdb-passwd, vdb-unlock, vdb-validate

The "help" information will be improved in near future releases, and the tool options will become standardized across the set. More documentation will also be provided documentation on the NCBI web site.

Tool options may change in the next release. Version 1 tool options will remain supported wherever possible in order to preserve operation of any existing scripts.

Please cite: Rasko Leinonen, Ruth Akhtar, Ewan Birney, James Bonfield, Lawrence Bower, Matt Corbett, Ying Cheng, Fehmi Demiralp, Nadeem Faruque, Neil Goodgame, Richard Gibson, Gemma Hoad, Christopher Hunter, Mikyung Jang, Steven Leonard, Quan Lin, Rodrigo Lopez, Michael Maguire, Hamish McWilliam, Sheila Plaister, Rajesh Radhakrishnan, Siamak Sobhany, Guy Slater, Petra Ten Hoopen, Franck Valentin, Robert Vaughan, Vadim Zalunin, Daniel Zerbino and Guy Cochrane: Improvements to services at the European Nucleotide Archive. (PubMed,eprint) Nucleic Acids Research 38(Database issue):D39-45 (2010)
Registry entries: Bio.tools  Bioconda 
srst2
Short Read Sequence Typing for Bacterial Pathogens
Versions of package srst2
ReleaseVersionArchitectures
bullseye0.2.0-8amd64,arm64,mips64el,ppc64el
trixie0.2.0-12amd64,arm64,mips64el,ppc64el,riscv64
stretch0.2.0-4amd64
buster0.2.0-6amd64
sid0.2.0-12amd64,arm64,mips64el,ppc64el,riscv64
bookworm0.2.0-9amd64,arm64,mips64el,ppc64el
Popcon: 3 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

This program is designed to take Illumina sequence data, a MLST database and/or a database of gene sequences (e.g. resistance genes, virulence genes, etc) and report the presence of STs and/or reference genes.

Please cite: Michael Inouye, Harriet Dashnow, Lesley-Ann Raven, Mark B Schultz, Bernard J Pope, Takehiro Tomita, Justin Zobel and Kathryn E Holt: SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. (PubMed,eprint) Genome Medicine 6(11):90 (2014)
Registry entries: Bioconda 
ssake
application de génomique pour assembler des millions de séquences très courtes d’ADN
Versions of package ssake
ReleaseVersionArchitectures
buster4.0-2all
bullseye4.0-3all
bookworm4.0.1-1all
stretch3.8.4-1all
trixie4.0.1-2all
sid4.0.1-2all
jessie3.8.2-1all
Debtags of package ssake:
biologynuceleic-acids
fieldbiology
interfaceshell
roleprogram
scopeutility
useanalysing
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Short Sequence Assembly par recherche de K-mer et l’extension de troisième lecture (SSAKE) est une application de génomique pour assembler énergiquement des millions de séquences courtes de nucléotides en recherchant progressivement les « 3′-most k-mers » en utilisant un arbre de préfixes d’ADN. SSAKE est conçu pour aider à exploiter les informations de lectures de courtes séquences en les regroupant rigoureusement dans des contigs pouvant être utilisés pour caractériser les cibles de séquençage nouvelles.

Please cite: Rene L. Warren, Granger G. Sutton, Steven J. M. Jones and Robert A. Holt: Assembling millions of short DNA sequences using SSAKE. (PubMed,eprint) Bioinformatics 23(4):500-501 (2007)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Sequence assembly
stacks
pipeline for building loci from short-read DNA sequences
Versions of package stacks
ReleaseVersionArchitectures
sid2.68+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
stretch1.44-2amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
buster2.2+dfsg-1amd64,arm64,armhf
bullseye2.55+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.62+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie2.68+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 2 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.

Note that this package installs Stacks such that all commands must be run as: $ stacks

The package is enhanced by the following packages: multiqc
Please cite: Julian Catchen, Paul A. Hohenlohe, Susan Bassham, Angel Amores and William A. Cresko: Stacks: an analysis tool set for population genomics. (PubMed) Molecular Ecology 22(11):3124-40 (2013)
Registry entries: Bio.tools  SciCrunch  Bioconda 
stringtie
assemble short RNAseq reads to transcripts
Versions of package stringtie
ReleaseVersionArchitectures
bullseye2.1.4+ds-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm2.2.1+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.2.1+ds-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie2.2.1+ds-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
upstream2.2.3
Popcon: 1 users (2 upd.)*
Newer upstream!
License: DFSG free
Git

The abundance of transcripts in a human tissue sample can be determined by RNA sequencing. The exact sequence sampled may be random, depending on the technology used. And it may be short, i.e. shorter than the transcript. At some point, many shorter reads need to be assembled to the model the complete transcripts.

StringTie knows how to assemble of RNA-Seq into potential transcripts without the need of a reference genome and provides a quantification also of the splice variants.

Please cite: Mihaela Pertea, Geo M. Pertea, Corina .M. Antonescu, Tsung-Cheng Chang, Joshua T. Mendell and Steven L. Salzberg: StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology 33:290–295 (2015)
Registry entries: Bio.tools  SciCrunch  Bioconda 
subread
boite à outils pour le traitement de données de séquençage de nouvelle génération
Versions of package subread
ReleaseVersionArchitectures
bookworm2.0.3+dfsg-1amd64,arm64,armel,armhf,i386,ppc64el
buster-backports2.0.0+dfsg-1~bpo10+1amd64,arm64,armel,armhf,i386,ppc64el
bullseye2.0.1+dfsg-1amd64,arm64,armel,armhf,i386,ppc64el
sid2.0.7+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
trixie2.0.7+dfsg-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64
stretch1.5.1+dfsg-4amd64,arm64,armel,armhf,i386,ppc64el
buster1.6.3+dfsg-1amd64,arm64,armhf,i386
Popcon: 7 users (6 upd.)*
Versions and Archs
License: DFSG free
Git

L’aligneur Subread peut être utilisé pour les lectures de gDNA-seq et RNA-seq. L’aligneur Subjunc a été conçu particulièrement pour la détection de jonction exon-exon. Pour le mappage de lectures RNA-seq, Subread réalise des alignements locaux et Subjunc réalise des alignements globaux.

Please cite: Yang Lian, Gordon K. Smyth and Wei Shi: The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. (PubMed) Nucleic Acids Research 47(8):e47-e47 (2019)
Registry entries: Bio.tools  SciCrunch  Bioconda 
sumaclust
fast and exact clustering of genomic sequences
Versions of package sumaclust
ReleaseVersionArchitectures
bookworm1.0.36+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.0.36+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.0.20-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0.31-2amd64,arm64,armhf,i386
sid1.0.36+ds-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

With the development of next-generation sequencing, efficient tools are needed to handle millions of sequences in reasonable amounts of time. Sumaclust is a program developed by the LECA. Sumaclust aims to cluster sequences in a way that is fast and exact at the same time. This tool has been developed to be adapted to the type of data generated by DNA metabarcoding, i.e. entirely sequenced, short markers. Sumaclust clusters sequences using the same clustering algorithm as UCLUST and CD- HIT. This algorithm is mainly useful to detect the 'erroneous' sequences created during amplification and sequencing protocols, deriving from 'true' sequences.

Registry entries: Bioconda 
sumatra
fast and exact comparison and clustering of sequences
Versions of package sumatra
ReleaseVersionArchitectures
bullseye1.0.36+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.0.20-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.0.31-2amd64,arm64,armhf,i386
bookworm1.0.36+ds-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.0.36+ds-2amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

With the development of next-generation sequencing, efficient tools are needed to handle millions of sequences in reasonable amounts of time. Sumatra is a program developed by the LECA. Sumatra aims to compare sequences in a way that is fast and exact at the same time. This tool has been developed to be adapted to the type of data generated by DNA metabarcoding, i.e. entirely sequenced, short markers. Sumatra computes the pairwise alignment scores from one dataset or between two datasets, with the possibility to specify a similarity threshold under which pairs of sequences that have a lower similarity are not reported. The output can then go through a classification process with programs such as MCL or MOTHUR.

Registry entries: SciCrunch 
tabix
indexeur générique pour fichiers de positions de génome, délimités par des tabulations
Versions of package tabix
ReleaseVersionArchitectures
jessie1.1-1amd64,armel,i386
stretch1.3.2-2amd64,arm64,armel,i386,mips64el,mipsel,ppc64el
jessie0.2.6-2armhf
sid1.20+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.20+ds-1amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.16+ds-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bullseye1.11-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.9-12~deb10u1amd64,arm64,armhf,i386
stretch-backports1.7-2~bpo9+1amd64,arm64,armel,armhf,mips,mips64el,mipsel,ppc64el,s390x
upstream1.21
Debtags of package tabix:
roleprogram
works-with-formathtml
Popcon: 27 users (7 upd.)*
Newer upstream!
License: DFSG free
Git

Tabix indexe des fichiers où certaines colonnes indiquent les coordonnées de séquence : nom (habituellement un chromosome), départ et fin. Le fichier de données d'entrée doit être trié en fonction du positionnement et compressé par bgzip (fourni dans ce paquet), qui a une interface similaire à celle de gzip. Après indexation, tabix est en mesure de récupérer rapidement les lignes de données par coordonnées chromosomiques. La récupération rapide des données fonctionne aussi sur le réseau si une URI est donnée comme nom de fichier.

Ce paquet a été compilé à partir du code source de HTSlib et fournit les outils bgzip, htsfile et tabix.

Please cite: Heng Li: Tabix: fast retrieval of sequence features from generic TAB-delimited files. (PubMed,eprint) Bioinformatics 27(5):718-719 (2011)
Registry entries: Bio.tools  Bioconda 
Screenshots of package tabix
transrate-tools
helper for transrate
Versions of package transrate-tools
ReleaseVersionArchitectures
stretch1.0.0-1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bullseye1.0.0-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm1.0.0-5amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie1.0.0-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
sid1.0.0-5amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster1.0.0-2amd64,arm64,armhf,i386
Popcon: 2 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Transrate is a library and command-line tool for quality assessment of de-novo transcriptome assemblies.

This package provides command line tools used by transrate to process BAM files.

Please cite: Richard Smith-Unna, Chris Boursnell, Rob Patro, Julian M. Hibberd and Steven Kelly: TransRate: reference-free quality assessment of de novo transcriptome assemblies.. (PubMed,eprint) Genome Research 26(8):1134-1144 (2016)
Registry entries: Bioconda 
trimmomatic
flexible read trimming tool for Illumina NGS data
Versions of package trimmomatic
ReleaseVersionArchitectures
buster0.38+dfsg-1all
stretch0.36+dfsg-1all
bullseye0.39+dfsg-2all
sid0.39+dfsg-2all
bookworm0.39+dfsg-2all
jessie0.32+dfsg-4all
trixie0.39+dfsg-2all
Popcon: 10 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

The current trimming steps are:

  • ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read.
  • SLIDINGWINDOW: Perform a sliding window trimming, cutting once thes average quality within the window falls below a threshold.
  • LEADING: Cut bases off the start of a read, if below a threshold quality
  • TRAILING: Cut bases off the end of a read, if below a threshold quality
  • CROP: Cut the read to a specified length
  • HEADCROP: Cut the specified number of bases from the start of the read
  • MINLENGTH: Drop the read if it is below a specified length
  • TOPHRED33: Convert quality scores to Phred-33
  • TOPHRED64: Convert quality scores to Phred-64 It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp'ed FASTQ. Use of gzip format is determined based on the .gz extension.
The package is enhanced by the following packages: multiqc
Please cite: A.M. Bolger, M. Lohse and B. Usadel: Trimmomatic: a flexible trimmer for Illumina sequence data. (PubMed,eprint) Bioinformatics 30(15):2114-2120 (2014)
Registry entries: Bio.tools  SciCrunch  Bioconda 
Topics: Sequencing
Screenshots of package trimmomatic
trinityrnaseq
RNA-Seq De novo Assembly
Versions of package trinityrnaseq
ReleaseVersionArchitectures
trixie2.15.2+dfsg-1amd64,arm64,ppc64el,riscv64
buster2.6.6+dfsg-6amd64
stretch2.2.0+dfsg-2amd64
bullseye2.11.0+dfsg-6amd64,arm64
sid2.15.2+dfsg-1amd64,arm64,ppc64el,riscv64
Popcon: 0 users (0 upd.)*
Versions and Archs
License: DFSG free
Git

Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.

Please cite: Manfred G Grabherr, Brian J Haas, Moran Yassour, Joshua Z Levin, Dawn A Thompson, Ido Amit, Xian Adiconis, Lin Fan, Raktima Raychowdhury, Qiandong Zeng, Zehua Chen, Evan Mauceli, Nir Hacohen, Andreas Gnirke, Nicholas Rhind, Federica di Palma, Bruce W Birren, Chad Nusbaum, Kerstin Lindblad-Toh, Nir Friedman and Aviv Regev: Full-length transcriptome assembly from RNA-Seq data without a reference genome.. (PubMed) Nature Biotechnology 29(7):644-652 (2011)
Registry entries: Bio.tools  SciCrunch  Bioconda 
uc-echo
error correction algorithm designed for short-reads from NGS
Versions of package uc-echo
ReleaseVersionArchitectures
trixie1.12-19amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
jessie1.12-7amd64,armel,armhf,i386
stretch1.12-9amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster1.12-11amd64,arm64,armhf,i386
bullseye1.12-15amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
bookworm1.12-18amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.12-19amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

ECHO is an error correction algorithm designed for short-reads from next-generation sequencing platforms such as Illumina's Genome Analyzer II. The algorithm uses a Bayesian framework to improve the quality of the reads in a given data set by employing maximum a posteriori estimation.

Please cite: W.-C. Kao, A.H. Chan and Y.S. Song: ECHO: A reference-free short-read error correction algorithm. (PubMed,eprint) Genome Research 21:1181-1192 (2011)
Registry entries: Bio.tools  SciCrunch 
Topics: Data management; Sequencing
vcftools
Collection of tools to work with VCF files
Versions of package vcftools
ReleaseVersionArchitectures
stretch0.1.14+dfsg-4+deb9u1amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
buster0.1.16-1amd64,arm64,armhf,i386
bookworm0.1.16-3amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
jessie0.1.12+dfsg-1amd64,armel,armhf,i386
bullseye0.1.16-2amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid0.1.16-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
jessie-security0.1.12+dfsg-1+deb8u1amd64,armel,armhf,i386
trixie0.1.16-3amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Debtags of package vcftools:
roleprogram
Popcon: 19 users (3 upd.)*
Versions and Archs
License: DFSG free
Git

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide methods for working with VCF files: validating, merging, comparing and calculate some basic population genetic statistics.

The package is enhanced by the following packages: multiqc
Please cite: Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Gerton Lunter, Gabor T. Marth, Stephen T. Sherry, Gilean McVean and Richard Durbin: The variant call format and VCFtools. (PubMed,eprint) Bioinformatics 27(15):2156-8 (2011)
Registry entries: Bio.tools  SciCrunch  Bioconda 
velvet
Nucleic acid sequence assembler for very short reads
Versions of package velvet
ReleaseVersionArchitectures
trixie1.2.10+dfsg1-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm1.2.10+dfsg1-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch1.2.10+dfsg1-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
jessie1.2.10+dfsg1-1amd64,armel,armhf,i386
sid1.2.10+dfsg1-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye1.2.10+dfsg1-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.2.10+dfsg1-5amd64,arm64,armhf,i386
Debtags of package velvet:
biologynuceleic-acids
fieldbiology, biology:bioinformatics
interfacecommandline
roleprogram
useanalysing
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired read information, if available, to retrieve the repeated areas between contigs.

Please cite: Daniel R. Zerbino and Ewan Birney: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. (PubMed,eprint) Genome Research 18(5):821-829 (2008)
Registry entries: Bio.tools  SciCrunch  Bioconda 
velvet-long
Nucleic acid sequence assembler for very short reads, long version
Versions of package velvet-long
ReleaseVersionArchitectures
jessie1.2.10+dfsg1-1amd64,armel,armhf,i386
bullseye1.2.10+dfsg1-7amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
buster1.2.10+dfsg1-5amd64,arm64,armhf,i386
stretch1.2.10+dfsg1-3amd64,arm64,armel,armhf,i386,mips,mips64el,mipsel,ppc64el,s390x
bookworm1.2.10+dfsg1-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.2.10+dfsg1-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
trixie1.2.10+dfsg1-9amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 0 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired read information, if available, to retrieve the repeated areas between contigs.

This package installs special long-mode versions of Velvet, as recommended in the Velvet tutorials.

Please cite: Daniel R. Zerbino and Ewan Birney: Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. (PubMed,eprint) Genome Research 18(5):821-829 (2008)
Registry entries: Bio.tools  SciCrunch  Bioconda 
velvetoptimiser
automatically optimise Velvet do novo assembly parameters
Versions of package velvetoptimiser
ReleaseVersionArchitectures
buster2.2.6-2all
stretch2.2.5-5all
jessie2.2.5-2all
sid2.2.6-5all
trixie2.2.6-5all
bookworm2.2.6-5all
bullseye2.2.6-3all
Popcon: 3 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

VelvetOptimiser is a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence assembler.

Registry entries: Bio.tools  Bioconda 
vsearch
tool for processing metagenomic sequences
Versions of package vsearch
ReleaseVersionArchitectures
bullseye2.15.2-3amd64,arm64,ppc64el
stretch2.3.4-1amd64
sid2.29.0-1amd64,arm64,mips64el,ppc64el,riscv64
buster2.10.4-1amd64
trixie2.29.0-1amd64,arm64,mips64el,ppc64el,riscv64
bookworm2.22.1-1amd64,arm64,ppc64el
Popcon: 8 users (4 upd.)*
Versions and Archs
License: DFSG free
Git

Versatile 64-bit multithreaded tool for processing metagenomic sequences, including searching, clustering, chimera detection, dereplication, sorting, masking and shuffling

The aim of this project is to create an alternative to the USEARCH tool developed by Robert C. Edgar (2010). The new tool should:

  • have a 64-bit design that handles very large databases and much more than 4GB of memory
  • be as accurate or more accurate than usearch
  • be as fast or faster than usearch
The package is enhanced by the following packages: vsearch-examples
Please cite: Torbjørn Rognes, Tomáš Flouri, Ben Nichols, Christopher Quince and Frédéric Mahé: VSEARCH: a versatile open source tool for metagenomics. (eprint) PeerJ 4:e2584
Registry entries: Bio.tools  Bioconda 
wham-align
Wisconsin's High-Throughput Alignment Method
Versions of package wham-align
ReleaseVersionArchitectures
sid0.1.5-8amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye0.1.5-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie0.1.5-8amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bookworm0.1.5-8amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (2 upd.)*
Versions and Archs
License: DFSG free
Git

This package provides functionality analogous to BWA or bowtie in aligning reads from next-generation DNA sequencing machines against a reference genome.

Please cite: Yinan Li, Allie Terrell and Jignesh M. Patel: WHAM: A High-throughput Sequence Alignment Method (eprint) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece (2011)
Registry entries: Bio.tools  SciCrunch  Bioconda 
wigeon
reimplementation of the Pintail 16S DNA anomaly detection utility
Versions of package wigeon
ReleaseVersionArchitectures
trixie20101212+dfsg1-6all
buster20101212+dfsg1-2all
bullseye20101212+dfsg1-4all
bookworm20101212+dfsg1-5all
stretch20101212+dfsg1-1all
sid20101212+dfsg1-6all
jessie20101212+dfsg-1all
Popcon: 4 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

WigeoN examines the sequence conservation between a query and a trusted reference sequence, both in NAST alignment format. Based on the sequence identity between the query and the reference sequence, there is an expected amount of variation among the alignment. If the observed variation is greater than the 95% quantile of the distribution of variation observed between non-anomalous sequences, then it is flagged as an anomaly.

WigeoN is a flexible command-line based reimplementation of the Pintail algorithm Appl Environ Microbiol. 2005 Dec;7112:7724-36.

WigeoN is useful for flagging chimeras and anomalies only in near full-length 16S rRNA sequences. WigeoN lacks sensitivity with sequences less than 1000 bp.

To run WigeoN, you need NAST-formatted sequences generated by the nast-ier utility.

WigeoN is part of the microbiomeutil suite.

The package is enhanced by the following packages: microbiomeutil-data
Please cite: Brian J. Haas, Dirk Gevers, Ashlee M. Earl, Mike Feldgarden, Doyle V. Ward, Georgia Giannoukos, Dawn Ciulla, Diana Tabbaa, Sarah K. Highlander, Erica Sodergren, Barbara Methé, Todd Z. DeSantis, The Human Microbiome Consortium, Joseph F. Petrosino, Rob Knight and Bruce W. Birren: Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. (PubMed,eprint) Genome Research 21(3):494-504 (2011)
Registry entries: SciCrunch 

Official Debian packages with lower relevance

nanolyse
remove lambda phage reads from a fastq file
Versions of package nanolyse
ReleaseVersionArchitectures
bookworm1.2.0-4amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid1.2.0-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye1.2.0-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
trixie1.2.0-4amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

NanoLyse is a tool for rapid removal of contaminant DNA, using the Minimap2 aligner through the mappy Python binding. A typical application would be the removal of the lambda phage control DNA fragment supplied by ONT, for which the reference sequence is included in the package. However, this approach may lead to unwanted loss of reads from regions highly homologous to the lambda phage genome.

Please cite: Wouter De Coster, Svenn D’Hert, Darrin T Schultz, Marc Cruts and Christine Van Broeckhoven: NanoPack: visualizing and processing long-read sequencing data. (PubMed,eprint) Bioinformatics 34(15):2666-2669 (2018)
Registry entries: Bioconda 
python3-anndata
annotated gene by sample numpy matrix
Versions of package python3-anndata
ReleaseVersionArchitectures
bookworm0.8.0-4all
sid0.10.6-1all
bullseye0.7.5+ds-3all
upstream0.10.9
Popcon: 0 users (2 upd.)*
Newer upstream!
License: DFSG free
Git

AnnData provides a scalable way of keeping track of data together with learned annotations. It is used within Scanpy, for which it was initially developed. Both packages have been introduced in Genome Biology (2018).

Please cite: F. Alexander Wolf, Philipp Angerer and Fabian J. Theis: SCANPY: large-scale single-cell gene expression data analysis.. (PubMed) Genome Biol. 19:15 (2018)
Registry entries: Bioconda 
r-bioc-isoformswitchanalyzer
Identify, Annotate and Visualize Alternative Splicing and
Versions of package r-bioc-isoformswitchanalyzer
ReleaseVersionArchitectures
trixie2.4.0+ds-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
bookworm1.20.0+ds-1amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
sid2.4.0+ds-1amd64,arm64,mips64el,ppc64el,riscv64,s390x
Popcon: 1 users (1 upd.)*
Versions and Archs
License: DFSG free
Git

Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data. Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.

Registry entries: Bio.tools  Bioconda 

Debian packages in contrib or non-free

bcbio
toolkit for analysing high-throughput sequencing data
Versions of package bcbio
ReleaseVersionArchitectures
sid1.2.9-2 (contrib)all
bullseye1.2.5-1 (contrib)all
bookworm1.2.9-2 (contrib)all
buster1.1.2-3all
Popcon: 0 users (2 upd.)*
Versions and Archs
License: DFSG free, but needs non-free components
Git

This package installs the command line tools of the bcbio-nextgen toolkit implementing best-practice pipelines for fully automated high throughput sequencing analysis.

A high-level configuration file specifies inputs and analysis parameters to drive a parallel pipeline that handles distributed execution, idempotent processing restarts and safe transactional steps. The project contributes a shared community resource that handles the data processing component of sequencing analysis, providing researchers with more time to focus on the downstream biology.

This package builds and having it in Debian unstable helps the Debian developers to synchronize their efforts. But unless a series of external dependencies are not installed manually, the functionality of bcbio in Debian is only a shadow of itself. Please use the official distribution of bcbio for the time being, which means "use conda". The TODO file in the Debian directory should give an overview on progress for Debian packaging.

Registry entries: Bio.tools  Bioconda 
cufflinks
Transcript assembly, differential expression and regulation for RNA-Seq
Versions of package cufflinks
ReleaseVersionArchitectures
bookworm2.2.1+dfsg.1-9 (non-free)amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
stretch2.2.1-3 (non-free)amd64
jessie2.2.1-1 (non-free)amd64
trixie2.2.1+dfsg.1-10 (non-free)amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
buster2.2.1+dfsg.1-3 (non-free)amd64,arm64,armhf,i386
sid2.2.1+dfsg.1-10 (non-free)amd64,arm64,armel,armhf,i386,mips64el,ppc64el,riscv64,s390x
bullseye2.2.1+dfsg.1-8 (non-free)amd64,arm64,armel,armhf,i386,mips64el,mipsel,ppc64el,s390x
Popcon: 1 users (2 upd.)*
Versions and Archs
License: non-free
Git

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

This package provides the binary of cufflinks and associated tools, i.e. compress_gtf, cuffcompare, cuffdiff, cuffmerge, cuffnorm, cuffquant and gtf_to_sam.

Please cite: Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold and Lior Pachter: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. (PubMed) Nature Biotechnology 28(5):511-515 (2010)
Registry entries: Bio.tools  SciCrunch  Bioconda 
vdjtools
framework for post-analysis of B/T cell repertoires
Versions of package vdjtools
ReleaseVersionArchitectures
trixie1.2.1+git20190311+repack-2 (non-free)all
bookworm1.2.1+git20190311+repack-1 (non-free)all
bullseye1.2.1+git20190311-5 (non-free)all
sid1.2.1+git20190311+repack-2 (non-free)all
Popcon: 0 users (1 upd.)*
Versions and Archs
License: non-free
Git

VDJtools is an open-source Java/Groovy-based framework designed to facilitate analysis of immune repertoire sequencing (RepSeq) data. VDJtools computes a wide set of statistics and is able to perform various forms of cross-sample analysis. Both comprehensive tabular output and publication-ready plots are provided.

The main aims of the VDJtools Project are:

  • To ensure consistency between post-analysis methods and results
  • To save the time of bioinformaticians analyzing RepSeq data
  • To create an API framework facilitating development of new RepSeq analysis applications
  • To provide a simple enough command line tool so it could be used by immunologists and biologists with little computational background
Please cite: M Shugay, D.V. Bagaev, M.A. Turchaninova, D.A. Bolotin, O.V. Britanova, E.V. Putintseva, M.V. Pogorelyy, V.I. Nazarov VI, I.V. Zvyagin, V.I. Kirgizova, K.I. Kirgizov, E.V. Skorobogatova and D.M. Chudakov: VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires. (PubMed,eprint) PLoS Comput Biol. 11(11):e1004503 (2015)

Packaging has started and developers might try the packaging code in VCS

graphmap2
highly sensitive and accurate mapper for long, error-prone reads
Versions of package graphmap2
ReleaseVersionArchitectures
VCS0.6.4-1all
Versions and Archs
License: MIT
Debian package not available
Git
Version: 0.6.4-1

GraphMap2 is a highly sensitive and accurate mapper for long, error- prone reads. The mapping algorithm is designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10–80% and maps

95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads.

Please cite: Ivan Sović, Mile Šikić, Andreas Wilm, Shannon Nicole Fenlon, Swaine Chen and Niranjan Nagarajan: Fast and sensitive mapping of nanopore sequencing reads with GraphMap. (PubMed,eprint) Nature Communications 7(11307) (2016)
Registry entries: Bioconda 
mosaik-aligner
reference-guided aligner for next-generation sequencing
Versions of package mosaik-aligner
ReleaseVersionArchitectures
VCS2.2.30+20140627-1all
Versions and Archs
License: MIT
Debian package not available
Git
Version: 2.2.30+20140627-1

MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikText converts alignments to different text-based formats.

At this time, the workflow consists of supplying sequences in FASTA, FASTQ, Illumina Bustard & Gerald, or SRF file formats and producing results in the BLAT axt, the BAM/SAM, the UCSC Genome Browser bed, or the Illumina ELAND formats.

nanoplot
plotting scripts for long read sequencing data
Versions of package nanoplot
ReleaseVersionArchitectures
VCS1.36.2-1all
Versions and Archs
License: MIT
Debian package not available
Git
Version: 1.36.2-1

NanoPlot provides plotting scripts for long read sequencing data.

These scripts perform data extraction from Oxford Nanopore sequencing data in the following formats:

  • fastq files (optionally compressed)
  • fastq files generated by albacore, guppy or MinKNOW containing additional information (optionally compressed)
  • sorted bam files
  • sequencing_summary.txt output table generated by albacore, guppy or MinKnow basecalling (optionally compressed)
  • fasta files (optionally compressed)
  • multiple files of the same type can be offered simultaneously
Please cite: Wouter De Coster, Svenn D'Hert, Darrin T Schultz, Marc Cruts and Christine Van Broeckhoven: NanoPack: visualizing and processing long-read sequencing data. (PubMed,eprint) Bioinformatics 34(15):2666-2669 (2018)
Registry entries: Bioconda 
r-bioc-mofa2
Multi-Omics Factor Analysis v2
Versions of package r-bioc-mofa2
ReleaseVersionArchitectures
VCS1.2.2+ds-1all
Versions and Archs
License: GPL-2+
Debian package not available
Git
Version: 1.2.2+ds-1

The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.

Registry entries: Bio.tools  Bioconda 
umap
quantify genome and methylome mappability
Versions of package umap
ReleaseVersionArchitectures
VCS1.0.0-1all
Versions and Archs
License: GPL-3.0
Debian package not available
Git
Version: 1.0.0-1

Umap identifies uniquely mappable regions of any genome. Its Bismap extension identifies mappability of the bisulfite converted genome (methylome).

Please cite: Mehran Karimzadeh, Carl Ernst, Anshul Kundaje and Michael M. Hoffman: Umap and Bismap: quantifying genome and methylome mappability. (PubMed,eprint) Nucleic Acids Res. 46(20):e120 (2018)
Registry entries: Bioconda 

No known packages available

annovar
annotate genetic variants detected from diverse genomes
License: Open Source for non-profit
Debian package not available

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others). Given a list of variants with chromosome, start position, end position, reference nucleotide and observed nucleotides, ANNOVAR can perform:

 1. Gene-based annotation: identify whether SNPs or CNVs cause protein coding
    changes and the amino acids that are affected. Users can flexibly use RefSeq
    genes, UCSC genes, ENSEMBL genes, GENCODE genes, or many other gene definition
    systems.
 2. Region-based annotations: identify variants in specific genomic regions,
    for example, conserved regions among 44 species, predicted transcription
    factor binding sites, segmental duplication regions, GWAS hits, database
    of genomic variants, DNAse I hypersensitivity sites, ENCODE
    H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks, RNA-Seq peaks, or many
    other annotations on genomic intervals.
 3. Filter-based annotation: identify variants that are reported in dbSNP,
    or identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project,
    or identify subset of non-synonymous SNPs with SIFT score>0.05, or many
    other annotations on specific mutations.
 4. Other functionalities: Retrieve the nucleotide sequence in any
    user-specific genomic positions in batch, identify a candidate gene list
    for Mendelian diseases from exome data, identify a list of SNPs from
    1000 Genomes that are in strong LD with a GWAS hit, and many other
    creative utilities.

In a modern desktop computer (3GHz Intel Xeon CPU, 8Gb memory), for 4.7 million variants, ANNOVAR requires ~4 minutes to perform gene-based functional annotation, or ~15 minutes to perform stepwise "variants reduction" procedure, making it practical to handle hundreds of human genomes in a day.

forge
genome assembler for mixed read types
License: Apache 2.0
Debian package not available

Forge Genome Assembler is a parallel, MPI based genome assembler for mixed read types.

Forge is a classic "Overlap layout consensus" genome assembler written by Darren Platt and Dirk Evers. Implemented in C++ and using the parallel MPI library, it runs on one or more machines in a network and can scale to very large numbers of reads provided there is enough collective memory on the machines used. It generates a full consensus alignment of all reads, can handle mixtures of sanger, 454 and illumina reads. There is some support for solid color space and it includes built in tools for vector trimming and contamination screening.

Forge and was originally developed at Exelixis and they have kindly agreed to place the software which underwent much subsequent development outside Exelixis, into the public domain. Forge works with most of the common MPI implementations.

Remark of Debian Med team: Competitor to MIRA2 and wgs-assembler

This package was requested by William Spooner whs@eaglegenomics.com as a competitor to MIRA2 and wgs-assembler.

*Popularitycontest results: number of people who use this package regularly (number of people who upgraded this package recently) out of 243344