Skip to content

Commit 9946c38

Browse files
authored
Merge pull request #138 from tp2750/main
Add Bioinformatics page
2 parents e9ef9c6 + 3e5efce commit 9946c38

3 files changed

Lines changed: 233 additions & 0 deletions

File tree

docs/_layout/pgwrap.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
</ul>
2222
<hr>
2323
<ul class="menu-list">
24+
{{metasection comparisons/bioinformatics "Bioinformatics"}}
2425
{{metasection comparisons/data_structures "Data Structures"}}
2526
{{metasection comparisons/performance_enhancement "Performance Enhancement"}}
2627
{{metasection comparisons/math Math}}
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
+++
2+
title = "Bioinformatics"
3+
+++
4+
5+
# Bioinformatics
6+
7+
The [BioJulia](https://github.com/biojulia) organization collects a lot of great packages related to bioinformatics.
8+
9+
## Bio.jl is Deprecated
10+
11+
Note that the [Bio.jl](https://github.com/BioJulia/Bio.jl) package is deprecated.
12+
In [this blogpost](https://biojulia.dev/posts/biojl/), the main developer of Bio.jl, describes where the functionality has gone:
13+
14+
15+
* Bio.Seq became [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/)
16+
* Bio.Align became [BioAlignments.jl](https://github.com/BioJulia/BioAlignments.jl/)
17+
* Bio.Intervals became [GenomicFeatures.jl](https://github.com/BioJulia/GenomicFeatures.jl/)
18+
* Bio.Structure became [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl/)
19+
* Bio.Var became [GeneticVariation.jl](https://github.com/BioJulia/GeneticVariation.jl/)
20+
* Bio.Phylo became [Phylogenies.jl](https://github.com/BioJulia/Phylogenies.jl/)
21+
* Bio.Services became [BioServices.jl](https://github.com/BioJulia/BioServices.jl/)
22+
* Bio.Tools became [BioTools.jl](https://github.com/BioJulia/BioTools.jl/) (now archived)
23+
24+
# File Parsers
25+
26+
An important task in bioinformatics is parsing files in various standard formats.
27+
Here we list some file formats and packages with parsers:
28+
29+
| Format | Extensions | Description | Packages |
30+
|-----------------------------------------------------------------------------------------|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
31+
| [FASTA](https://en.wikipedia.org/wiki/FASTA_format) | .fas, .fasta, .fa | DNA or protein sequences without annotations | [FASTX](https://github.com/BioJulia/FASTX.jl) |
32+
| [FASTQ](https://en.wikipedia.org/wiki/FASTQ_format) | .fq, .fastq | DNA sequences with quality information | [FASTX](https://github.com/BioJulia/FASTX.jl) |
33+
| [GENBANK](https://en.wikipedia.org/wiki/GenBank) | .gb, .gbk | DNA or protein sequences with annotations | [GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl) |
34+
| EMBL | .embl | DNA or protein sequences with annotations | [GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl) |
35+
| [SAM](https://en.wikipedia.org/wiki/SAM_(file_format)) | .sam | Aligned DNA sequences (typically from read mapping). Text based. | [XAM.jl](https://github.com/BioJulia/XAM.jl) |
36+
| [BAM](https://en.wikipedia.org/wiki/BAM_(file_format)) | .bam | Aligned DNA sequences (typically from read mapping). Binary. | [XAM.jl](https://github.com/BioJulia/XAM.jl) |
37+
| [PDB](https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)) | .pdb | Protein 3D structure. | [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl), [MIToS](https://github.com/diegozea/MIToS.jl) |
38+
| [mmCIF](https://en.wikipedia.org/wiki/Macromolecular_Crystallographic_Information_File) | | Macromolecular Crystallographic Information File (mmCIF) also known as PDBx/mmCIF is a standard text file format for representing macromolecular structure data | [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl), [MIToS](https://github.com/diegozea/MIToS.jl) |
39+
| [MMTF](https://github.com/rcsb/mmtf) | | MacroMolecular Transmission Format (MMTF) is a binary encoding of biological structures. | [BioStructures.jl](https://github.com/BioJulia/BioStructures.jl) |
40+
| [DSSP](https://github.com/PDB-REDO/dssp) | | Protein Secondary Structure | [ProteinSecondaryStructures.jl](https://github.com/BioJulia/ProteinSecondaryStructures.jl) |
41+
| [STRIDE](https://webclu.bio.wzw.tum.de/stride/) | | Protein Secondary Structure | [ProteinSecondaryStructures.jl](https://github.com/BioJulia/ProteinSecondaryStructures.jl) |
42+
| [PAF](https://github.com/slimsuite/pafscaff) | .paf | Pairwise mApping Format. | [PairwiseMappingFormat.jl](https://github.com/BioJulia/PairwiseMappingFormat.jl) |
43+
| [Stockholm](https://en.wikipedia.org/wiki/Stockholm_format) | .sto, .stk, .stockholm | Stockholm format is a multiple sequence alignment format used by Pfam, Rfam and Dfam | [MIToS.jl](https://github.com/diegozea/MIToS.jl) |
44+
| [A3M](https://en.wikipedia.org/wiki/FASTA_format) | .fas | A2M/A3M are a family of FASTA-derived formats used for sequence alignments | [MIToS.jl](https://github.com/diegozea/MIToS.jl) |
45+
| [PIR](https://www.bioinformatics.nl/tools/crab_pir.html) | .pir | Multiple sequence alignment format | [MIToS.jl](https://github.com/diegozea/MIToS.jl) |
46+
47+
# Data Structures
48+
49+
The basic data structures for representing DNA, RNA and protein sequences are in [BioSequences.jl](https://github.com/BioJulia/BioSequences.jl)
50+
51+
# Pairwise Sequence Alignments
52+
53+
A core task in bioinformatics is aligning sequences.
54+
This can be done with [BioAlignments.jl](https://github.com/BioJulia/BioAlignments.jl) which includes algorithms for the following pairwise alignment types:
55+
56+
* GlobalAlignment: global-to-global alignment
57+
* SemiGlobalAlignment: local-to-global alignment
58+
* LocalAlignment: local-to-local alignment
59+
* OverlapAlignment: end-free alignment
60+
61+
# Multiple Sequence Alignment (MSA)
62+
63+
I'm not aware of tools in Julia to compute multiple sequence alignment, but [MIToS.jl](https://github.com/diegozea/MIToS.jl) can read the most common MSA formats: stockholm, FASTA, A3M, A2M, PIR or Raw format
64+
65+
# Package descriptions
66+
67+
## BioSequences.jl
68+
{{badge BioSequences}}
69+
> Biological sequences for the julia language
70+
[BioSequences.jl](https://github.com/BioJulia/BioSequences.jl/) BioSequences provides data types and methods for common operations with biological sequences, including DNA, RNA, and amino acid sequences.
71+
72+
It can do sequence search and pattern matching in sequences, and compute simple sequence statistics.
73+
74+
## BioAlignments.jl
75+
{{badge BioAlignments}}
76+
> Sequence alignment tools
77+
[BioAlignments.jl](https://github.com/BioJulia/BioAlignments.jl/) provides sequence alignment algorithms and data structures.
78+
It includes algorithms for the following pairwise alignment types:
79+
80+
* GlobalAlignment: global-to-global alignment
81+
* SemiGlobalAlignment: local-to-global alignment
82+
* LocalAlignment: local-to-local alignment
83+
* OverlapAlignment: end-free alignment
84+
85+
## GenomicFeatures.jl
86+
{{badge GenomicFeatures}}
87+
> Tools for genomic features in Julia.
88+
[GenomicFeatures.jl](https://github.com/BioJulia/GenomicFeatures.jl/)
89+
90+
## GenomicAnnotations.jl
91+
{{badge GenomicAnnotations}}
92+
> GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank, GFF3, GFF2/GTF, and EMBL file formats.
93+
[GenomicAnnotations.jl](https://github.com/BioJulia/GenomicAnnotations.jl)
94+
95+
## BioStructures.jl
96+
> A Julia package to read, write and manipulate macromolecular structures
97+
{{badge BioStructures}}
98+
[BioStructures.jl](https://github.com/BioJulia/BioStructures.jl/)
99+
100+
From the package README:
101+
102+
BioStructures provides functionality to read, write and manipulate
103+
macromolecular structures, in particular proteins.
104+
[Protein Data Bank](https://www.rcsb.org/pdb/home/home.do) (PDB), mmCIF and MMTF
105+
format files can be read in to a hierarchical data structure. Spatial
106+
calculations and functions to access the PDB are also provided.
107+
It compares favourably in terms of performance to other PDB parsers -
108+
see some [benchmarks online](https://github.com/jgreener64/pdb-benchmarks) - and
109+
should be lightweight enough to build other packages on top of.
110+
111+
## GeneticVariation.jl
112+
{{badge GeneticVariation}}
113+
> Datastructures and algorithms for working with genetic variation
114+
[GeneticVariation.jl](https://github.com/BioJulia/GeneticVariation.jl/)
115+
116+
From the package README:
117+
118+
GeneticVariation provides types and methods for working with datasets of genetic variation. It provides a VCF and BCF parser, as well as methods for working with variation in sequences such as evolutionary distance computation, and counting different mutation types.
119+
120+
## Phylogenies.jl
121+
{{badge Phylogenies}}
122+
> The BioJulia package for working with phylogenetic trees and geneologies.
123+
[Phylogenies.jl](https://github.com/BioJulia/Phylogenies.jl/)
124+
125+
This looks stale.
126+
127+
From the package README:
128+
129+
A julia package providing an abstract type and interface for phylogenies, a concrete phylogeny type implementation, and higher-level methods for working with phylogenies.
130+
131+
In development.
132+
133+
134+
## GenomeGraphs.jl
135+
{{badge GenomeGraphs}}
136+
> A modern genomics framework for julia
137+
[GenomeGraphs.jl](https://github.com/BioJulia/GenomeGraphs.jl)
138+
139+
From the package README:
140+
141+
GenomeGraphs provides a representation of sequence graphs. Such graphs represent genome assemblies and population graphs of genotypes/haplotypes and variation.
142+
143+
144+
## BioServices.jl
145+
{{badge BioServices}}
146+
> Julia interface to APIs for various bio-related web services
147+
[BioServices.jl](https://github.com/BioJulia/BioServices.jl/)
148+
149+
## NCBIBlast.jl
150+
{{badge NCBIBlast}}
151+
> Thin wrapper around NCBI's BLAST+ CLI https://www.ncbi.nlm.nih.gov/books/NBK569856/
152+
[NCBIBlast.jl](https://github.com/BioJulia/NCBIBlast.jl/)
153+
154+
From the package README:
155+
156+
This package is a thin wrapper around the Basic Local Alignment Search Tool CLI, better known as BLAST, developed by the National Center for Biotechnology Information (NCBI).
157+
158+
For now, this uses CondaPkg.jl to install BLAST+.
159+
160+
## FASTX.jl
161+
{{badge FASTX}}
162+
> Parse and process FASTA and FASTQ formatted files of biological sequences.
163+
[FASTX](https://github.com/BioJulia/FASTX.jl)
164+
165+
FASTX provides I/O and utilities for manipulating FASTA and FASTQ, formatted sequence data files.
166+
167+
## XAM.jl
168+
{{badge XAM}}
169+
> Parse and process FASTA and FASTQ formatted files of biological sequences.
170+
[XAM.jl](https://github.com/BioJulia/XAM.jl)
171+
172+
FASTX provides I/O and utilities for manipulating FASTA and FASTQ, formatted sequence data files.
173+
174+
## PairwiseMappingFormat.jl
175+
{{badge PairwiseMappingFormat}}
176+
> Parser for the PAF format in bioinformatics
177+
[PairwiseMappingFormat.jl](https://github.com/BioJulia/PairwiseMappingFormat.jl)
178+
179+
PairwiseMappingFormat.jl provide a parser for Pairwise Mapping Format (PAF) files. PAF is a simple, tab-delimited format created by programs such as minimap2.
180+
181+
## ProteinSecondaryStructures.jl
182+
{{badge ProteinSecondaryStructures}}
183+
> Wrapper to protein secondary structure calculation packages
184+
[ProteinSecondaryStructures.jl](https://github.com/BioJulia/ProteinSecondaryStructures.jl)
185+
186+
From the package README:
187+
188+
This package parses [STRIDE]( http://webclu.bio.wzw.tum.de/stride/) and [DSSP](https://github.com/PDB-REDO/dssp) secondary structure prediction outputs, to make them convenient to use from Julia, particularly for the analysis of MD simulations.
189+
190+
191+
## BioMakie.jl
192+
{{badge BioMakie}}
193+
> Plotting and interface tools for biology.
194+
[BioMakie.jl](https://github.com/BioJulia/BioMakie.jl)
195+
196+
[BioMakie.jl](https://github.com/BioJulia/BioMakie.jl) has functions to visualize
197+
198+
* Protein 3D structures
199+
* Multiple Sequence Alignments
200+
201+
## MIToS.jl
202+
{{badge MIToS}}
203+
> A Julia package to analyze protein sequences, structures, and evolutionary information
204+
[MIToS](https://github.com/diegozea/MIToS.jl)
205+
206+
From the package README:
207+
208+
MIToS provides a comprehensive suite of tools for the analysis of protein sequences and structures.
209+
It allows working with **Multiple Sequence Alignments (MSAs)** to obtain evolutionary information in the Julia language [1].
210+
In particular, it eases the analysis of coevoling position in an MSA using **Mutual Information (MI)**, a measure of covariation.
211+
MI-derived scores are good predictors of inter-residue contacts in a protein structure and functional sites in proteins [2,3].
212+
To allow such analysis, MIToS also implements several useful tools for working with protein structures, such as those available in the **Protein Data Bank (PDB)** or predicted by AlphaFold 2.
213+
214+
# Star History
215+
{{star_history BioSequences BioAlignments GenomicFeatures BioStructures GeneticVariation Phylogenies GenomeGraphs BioServices NCBIBlast FASTX XAM PairwiseMappingFormat ProteinSecondaryStructures BioMakie MIToS}}

docs/utils.jl

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,23 @@ const PKGINFOS = [
160160
PkgInfo(pkgname="Pluto", username="fonsp", branch="main", docslink="https://plutojl.org/", codecovlink=nothing),
161161
PkgInfo(pkgname="Neptune", username="compleathorseplayer", branch="master", docslink=nothing),
162162
PkgInfo(pkgname="BonitoBook", username="SimonDanisch", branch="main", docslink="https://bonitobook.org/website/"),
163+
PkgInfo(pkgname="BioSequences", username="BioJulia", branch="master"),
164+
PkgInfo(pkgname="BioAlignments", username="BioJulia", branch="master"),
165+
PkgInfo(pkgname="GenomicFeatures", username="BioJulia", branch="master"),
166+
PkgInfo(pkgname="GenomicAnnotations", username="BioJulia", branch="master"),
167+
PkgInfo(pkgname="BioStructures", username="BioJulia", branch="master"),
168+
PkgInfo(pkgname="GeneticVariation", username="BioJulia", branch="master"),
169+
PkgInfo(pkgname="Phylogenies", username="BioJulia", branch="master"),
170+
PkgInfo(pkgname="GenomeGraphs", username="BioJulia", branch="master"),
171+
PkgInfo(pkgname="BioServices", username="BioJulia", branch="master"),
172+
PkgInfo(pkgname="NCBIBlast", username="BioJulia", branch="master", docslink=nothing),
173+
PkgInfo(pkgname="FASTX", username="BioJulia", branch="master"),
174+
PkgInfo(pkgname="XAM", username="BioJulia", branch="master"),
175+
PkgInfo(pkgname="PairwiseMappingFormat", username="BioJulia", branch="master"),
176+
PkgInfo(pkgname="ProteinSecondaryStructures", username="BioJulia", branch="master"),
177+
PkgInfo(pkgname="BioMakie", username="BioJulia", branch="master", docslink="https://biojulia.dev/BioMakie.jl/dev/"),
178+
PkgInfo(pkgname="MIToS", username="diegozea", branch="master"),
179+
163180
]
164181

165182
function get_pkginfo(pkgname)

0 commit comments

Comments
 (0)