Skip to content

Commit 5f2f15d

Browse files
committed
add bioinformatics page
1 parent e9ef9c6 commit 5f2f15d

3 files changed

Lines changed: 250 additions & 0 deletions

File tree

docs/_layout/pgwrap.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
</ul>
2222
<hr>
2323
<ul class="menu-list">
24+
{{metasection comparisons/bioinformatics "Bioinformatics"}}
2425
{{metasection comparisons/data_structures "Data Structures"}}
2526
{{metasection comparisons/performance_enhancement "Performance Enhancement"}}
2627
{{metasection comparisons/math Math}}
Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
+++
2+
title = "Bioinformatics"
3+
+++
4+
5+
# Bioinformatics
6+
7+
The [BioJulia](https://github.com/biojulia) organization collects a lot of great packages related to bioinformatics.
8+
9+
## Bio.jl is Deprecated
10+
11+
Note that the [Bio.jl](https://github.com/BioJulia/Bio.jl) package is deprecated.
12+
In [this blogpost](https://biojulia.dev/posts/biojl/), the main developer of Bio.jl, describes where the functionality has gone:
13+
14+
15+
* Bio.Seq became [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/)
16+
* Bio.Align became [BioAlignments.jl](https://github.com/BioJulia/BioAlignments.jl/)
17+
* Bio.Intervals became [GenomicFeatures.jl](https://github.com/BioJulia/GenomicFeatures.jl/)
18+
* Bio.Structure became [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl/)
19+
* Bio.Var became [GeneticVariation.jl](https://github.com/BioJulia/GeneticVariation.jl/)
20+
* Bio.Phylo became [Phylogenies.jl](https://github.com/BioJulia/Phylogenies.jl/)
21+
* Bio.Services became [BioServices.jl](https://github.com/BioJulia/BioServices.jl/)
22+
* Bio.Tools became [BioTools.jl](https://github.com/BioJulia/BioTools.jl/) (now archived)
23+
24+
# File Parsers
25+
26+
An important task in bioinformatics is parsing files in various standard formats.
27+
Here we list some file formats and packages with parsers:
28+
29+
* [FASTA](https://en.wikipedia.org/wiki/FASTA_format) (.fas, .fasta, .fa): DNA or protein sequences without annotations
30+
- [FASTX](https://github.com/BioJulia/FASTX.jl)
31+
* [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) (.fq, .fastq): DNA sequences with quality information
32+
- [FASTX](https://github.com/BioJulia/FASTX.jl)
33+
* [GENBANK](https://en.wikipedia.org/wiki/GenBank) (.gb, .gbk): DNA or protein sequences with annotations
34+
- [GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl)
35+
* EMBL (.embl): DNA or protein sequences with annotations
36+
- [GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl)
37+
* GFF3, GFF2/GTF (.gff): Annotated genomes
38+
- [GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl)
39+
* [SAM](https://en.wikipedia.org/wiki/SAM_(file_format)) (.sam): Aligned DNA sequences (typically from read mapping). Text based.
40+
- [XAM.jl](https://github.com/BioJulia/XAM.jl)
41+
* [BAM](https://en.wikipedia.org/wiki/BAM_(file_format)) (.bam): Aligned DNA sequences (typically from read mapping). Binary.
42+
- [XAM.jl](https://github.com/BioJulia/XAM.jl)
43+
* PDB (.pdb): Protein 3D structure.
44+
- [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl)
45+
- [MIToS](https://github.com/diegozea/MIToS.jl)
46+
* [mmCIF](https://en.wikipedia.org/wiki/Macromolecular_Crystallographic_Information_File): Macromolecular Crystallographic Information File (mmCIF) also known as PDBx/mmCIF is a standard text file format for representing macromolecular structure data
47+
- [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl)
48+
- [MIToS](https://github.com/diegozea/MIToS.jl)
49+
* [MMTF](https://github.com/rcsb/mmtf): MacroMolecular Transmission Format (MMTF) is a binary encoding of biological structures.
50+
- [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl)
51+
* [DSSP](https://github.com/PDB-REDO/dssp) Protein Secondary Structure
52+
- [ProteinSecondaryStructures.jl](https://github.com/BioJulia/ProteinSecondaryStructures.jl)
53+
* [STRIDE](https://webclu.bio.wzw.tum.de/stride/) Protein Secondary Structure
54+
- [ProteinSecondaryStructures.jl](https://github.com/BioJulia/ProteinSecondaryStructures.jl)
55+
* [PAF](https://github.com/slimsuite/pafscaff) (.paf) Pairwise mApping Format.
56+
- [PairwiseMappingFormat.jl](https://github.com/BioJulia/PairwiseMappingFormat.jl)
57+
* [Stockholm][https://en.wikipedia.org/wiki/Stockholm_format] (.sto, .stk, .stockholm): Stockholm format is a multiple sequence alignment format used by Pfam, Rfam and Dfam
58+
- [MIToS.jl](https://github.com/diegozea/MIToS.jl)
59+
* [A3M](https://en.wikipedia.org/wiki/FASTA_format) A2M/A3M are a family of FASTA-derived formats used for sequence alignments
60+
- [MIToS.jl](https://github.com/diegozea/MIToS.jl)
61+
* [PIR](https://www.bioinformatics.nl/tools/crab_pir.html) Multiple sequence alignment format
62+
- [MIToS.jl](https://github.com/diegozea/MIToS.jl)
63+
64+
# Data Structures
65+
66+
The basic data structures for representing DNA, RNA and protein sequences are in [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl)
67+
68+
# Pairwise Sequence Alignments
69+
70+
A core task in bioinformatics is aligning sequences.
71+
This can be done with [BioAlignments.jl](https://github.com/BioJulia/BioAlignments.jl) which includes algorithms for the following pairwise alignment types:
72+
73+
* GlobalAlignment: global-to-global alignment
74+
* SemiGlobalAlignment: local-to-global alignment
75+
* LocalAlignment: local-to-local alignment
76+
* OverlapAlignment: end-free alignment
77+
78+
# Multiple Sequence Alignment (MSA)
79+
80+
I'm not aware of tools in Julia to compute multiple sequence alignment, but [MIToS.jl](https://github.com/diegozea/MIToS.jl) can read the most common MSA formats: stockholm, FASTA, A3M, A2M, PIR or Raw format
81+
82+
# Package descriptions
83+
84+
## BioSequences.jl
85+
{{badge BioSequences}}
86+
> Biological sequences for the julia language
87+
[BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/) BioSequences provides data types and methods for common operations with biological sequences, including DNA, RNA, and amino acid sequences.
88+
89+
It can do sequence search and pattern matching in sequences, and compute simple sequence statistics.
90+
91+
## BioAlignments.jl
92+
{{badge BioAlignments}}
93+
> Sequence alignment tools
94+
[BioAlignments.jl](https://github.com/BioJulia/BioAlignments.jl/) provides sequence alignment algorithms and data structures.
95+
It includes algorithms for the following pairwise alignment types:
96+
97+
* GlobalAlignment: global-to-global alignment
98+
* SemiGlobalAlignment: local-to-global alignment
99+
* LocalAlignment: local-to-local alignment
100+
* OverlapAlignment: end-free alignment
101+
102+
## GenomicFeatures.jl
103+
{{badge GenomicFeatures}}
104+
> Tools for genomic features in Julia.
105+
[GenomicFeatures.jl](https://github.com/BioJulia/GenomicFeatures.jl/)
106+
107+
## GenomicAnnotations.jl
108+
{{badge GenomicAnnotations}}
109+
> GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank, GFF3, GFF2/GTF, and EMBL file formats.
110+
[GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl)
111+
112+
## BioStructures.jl
113+
> A Julia package to read, write and manipulate macromolecular structures
114+
{{badge BioStructures}}
115+
[BioStructures.jl](https://github.com/BioJulia/BioStructures.jl/)
116+
117+
From the package README:
118+
119+
BioStructures provides functionality to read, write and manipulate
120+
macromolecular structures, in particular proteins.
121+
[Protein Data Bank](https://www.rcsb.org/pdb/home/home.do) (PDB), mmCIF and MMTF
122+
format files can be read in to a hierarchical data structure. Spatial
123+
calculations and functions to access the PDB are also provided.
124+
It compares favourably in terms of performance to other PDB parsers -
125+
see some [benchmarks online](https://github.com/jgreener64/pdb-benchmarks) - and
126+
should be lightweight enough to build other packages on top of.
127+
128+
## GeneticVariation.jl
129+
{{badge GeneticVariation}}
130+
> Datastructures and algorithms for working with genetic variation
131+
[GeneticVariation.jl](https://github.com/BioJulia/GeneticVariation.jl/)
132+
133+
From the package README:
134+
135+
GeneticVariation provides types and methods for working with datasets of genetic variation. It provides a VCF and BCF parser, as well as methods for working with variation in sequences such as evolutionary distance computation, and counting different mutation types.
136+
137+
## Phylogenies.jl
138+
{{badge Phylogenies}}
139+
> The BioJulia package for working with phylogenetic trees and geneologies.
140+
[Phylogenies.jl](https://github.com/BioJulia/Phylogenies.jl/)
141+
142+
This looks stale.
143+
144+
From the package README:
145+
146+
A julia package providing an abstract type and interface for phylogenies, a concrete phylogeny type implementation, and higher-level methods for working with phylogenies.
147+
148+
In development.
149+
150+
151+
## GenomeGraphs.jl
152+
{{badge GenomeGraphs}}
153+
> A modern genomics framework for julia
154+
[GenomeGraphs.jl](https://github.com/BioJulia/GenomeGraphs.jl)
155+
156+
From the package README:
157+
158+
GenomeGraphs provides a representation of sequence graphs. Such graphs represent genome assemblies and population graphs of genotypes/haplotypes and variation.
159+
160+
161+
## BioServices.jl
162+
{{badge BioServices}}
163+
> Julia interface to APIs for various bio-related web services
164+
[BioServices.jl](https://github.com/BioJulia/BioServices.jl/)
165+
166+
## NCBIBlast.jl
167+
{{badge NCBIBlast}}
168+
> Thin wrapper around NCBI's BLAST+ CLI https://www.ncbi.nlm.nih.gov/books/NBK569856/
169+
[NCBIBlast.jl](https://github.com/BioJulia/NCBIBlast.jl/)
170+
171+
From the package README:
172+
173+
This package is a thin wrapper around the Basic Local Alignment Search Tool CLI, better known as BLAST, developed by the National Center for Biotechnology Information (NCBI).
174+
175+
For now, this uses CondaPkg.jl to install BLAST+.
176+
177+
## FASTX.jl
178+
{{badge FASTX}}
179+
> Parse and process FASTA and FASTQ formatted files of biological sequences.
180+
[FASTX](https://github.com/BioJulia/FASTX.jl)
181+
182+
FASTX provides I/O and utilities for manipulating FASTA and FASTQ, formatted sequence data files.
183+
184+
## XAM.jl
185+
{{badge XAM}}
186+
> Parse and process FASTA and FASTQ formatted files of biological sequences.
187+
[XAM.jl](https://github.com/BioJulia/XAM.jl)
188+
189+
FASTX provides I/O and utilities for manipulating FASTA and FASTQ, formatted sequence data files.
190+
191+
## PairwiseMappingFormat.jl
192+
{{badge PairwiseMappingFormat}}
193+
> Parser for the PAF format in bioinformatics
194+
[PairwiseMappingFormat.jl](https://github.com/BioJulia/PairwiseMappingFormat.jl)
195+
196+
PairwiseMappingFormat.jl provide a parser for Pairwise Mapping Format (PAF) files. PAF is a simple, tab-delimited format created by programs such as minimap2.
197+
198+
## ProteinSecondaryStructures.jl
199+
{{badge ProteinSecondaryStructures}}
200+
> Wrapper to protein secondary structure calculation packages
201+
[ProteinSecondaryStructures.jl](https://github.com/BioJulia/ProteinSecondaryStructures.jl)
202+
203+
From the package README:
204+
205+
This package parses [STRIDE]( http://webclu.bio.wzw.tum.de/stride/) and [DSSP](https://github.com/PDB-REDO/dssp) secondary structure prediction outputs, to make them convenient to use from Julia, particularly for the analysis of MD simulations.
206+
207+
208+
## BioMakie.jl
209+
{{badge BioMakie}}
210+
> Plotting and interface tools for biology.
211+
[BioMakie.jl](https://github.com/BioJulia/BioMakie.jl)
212+
213+
[BioMakie.jl](https://github.com/BioJulia/BioMakie.jl) has functions to visualize
214+
215+
* Protein 3D structures
216+
* Multiple Sequence Alignments
217+
218+
## MIToS.jl
219+
{{badge MIToS}}
220+
> A Julia package to analyze protein sequences, structures, and evolutionary information
221+
[MIToS](https://github.com/diegozea/MIToS.jl)
222+
223+
From the package README:
224+
225+
MIToS provides a comprehensive suite of tools for the analysis of protein sequences and structures.
226+
It allows working with **Multiple Sequence Alignments (MSAs)** to obtain evolutionary information in the Julia language [1].
227+
In particular, it eases the analysis of coevoling position in an MSA using **Mutual Information (MI)**, a measure of covariation.
228+
MI-derived scores are good predictors of inter-residue contacts in a protein structure and functional sites in proteins [2,3].
229+
To allow such analysis, MIToS also implements several useful tools for working with protein structures, such as those available in the **Protein Data Bank (PDB)** or predicted by AlphaFold 2.
230+
231+
# Star History
232+
{{star_history BioSequences BioAlignments GenomicFeatures BioStructures GeneticVariation Phylogenies GenomeGraphs BioServices NCBIBlast FASTX XAM PairwiseMappingFormat ProteinSecondaryStructures BioMakie MIToS}}

docs/utils.jl

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,23 @@ const PKGINFOS = [
160160
PkgInfo(pkgname="Pluto", username="fonsp", branch="main", docslink="https://plutojl.org/", codecovlink=nothing),
161161
PkgInfo(pkgname="Neptune", username="compleathorseplayer", branch="master", docslink=nothing),
162162
PkgInfo(pkgname="BonitoBook", username="SimonDanisch", branch="main", docslink="https://bonitobook.org/website/"),
163+
PkgInfo(pkgname="BioSequences", username="BioJulia", branch="master"),
164+
PkgInfo(pkgname="BioAlignments", username="BioJulia", branch="master"),
165+
PkgInfo(pkgname="GenomicFeatures", username="BioJulia", branch="master"),
166+
PkgInfo(pkgname="GenomicAnnotations", username="BioJulia", branch="master"),
167+
PkgInfo(pkgname="BioStructures", username="BioJulia", branch="master"),
168+
PkgInfo(pkgname="GeneticVariation", username="BioJulia", branch="master"),
169+
PkgInfo(pkgname="Phylogenies", username="BioJulia", branch="master"),
170+
PkgInfo(pkgname="GenomeGraphs", username="BioJulia", branch="master"),
171+
PkgInfo(pkgname="BioServices", username="BioJulia", branch="master"),
172+
PkgInfo(pkgname="NCBIBlast", username="BioJulia", branch="master", docslink=nothing),
173+
PkgInfo(pkgname="FASTX", username="BioJulia", branch="master"),
174+
PkgInfo(pkgname="XAM", username="BioJulia", branch="master"),
175+
PkgInfo(pkgname="PairwiseMappingFormat", username="BioJulia", branch="master"),
176+
PkgInfo(pkgname="ProteinSecondaryStructures", username="BioJulia", branch="master"),
177+
PkgInfo(pkgname="BioMakie", username="BioJulia", branch="master", docslink="https://biojulia.dev/BioMakie.jl/dev/"),
178+
PkgInfo(pkgname="MIToS", username="diegozea", branch="master"),
179+
163180
]
164181

165182
function get_pkginfo(pkgname)

0 commit comments

Comments
 (0)