CAJUN
2024-04-24
Bio.jl
monorepo
@dcjones
, @blahah
, @bicycle1885
, @transgirlcodes
@dcjones
for fast / accurrate parsing@bicycle1885
replaces Ragel with all-julia state machine generator (Automa.jl
)Bio.Seq
becomes BioSequences.jl
(effort driven by @transgirlcodes
and @bicycle1885
)REQUIRE
to Project.toml
for julia v1.0Bio.jl
officially deprecated / archivedProject Admins
@jakni
- Jakob Nybo Nissen, University of Copenhagen
BioSequences.jl
, Automa.jl
, Kmers.jl
@kescobo
- Senior Research Scientist at Wellesley College
Microbiome.jl
and BiobackeryUtils.jl
moved to EcoJulia 🤦Other major early contributors
@jgreener64
: BioStructures.jl
@prcastro
@kdm9
DNA
Protein
import Base: summarysize
using BioSequences, Random
seq = randseq(DNAAlphabet{2}(), 512);
str = String(seq);
summarysize(seq) # 184
summarysize(str) # 520
FASTA
FASTQ
SAM
@HD VN:1.0 SO:unsorted
@SQ SN:1455__A0A0C2TZA5__A3781_04875 LN:1008
@SQ SN:1455__A0A0C2XRW0__A3781_18225 LN:804
@SQ SN:1455__A0A0C2XXQ4__A3781_14565 LN:867
...
VH01194:15:AAAWT2VHV:1:1101:49456:1398:N:0:GAACTGAGCG+CGCTCCACGA#0/1__1.101 16 1134687__A0A378ENW4__cobJ 67 3 150M * 0 0 CTGCAGGCGGCGGAAATCGTCGTCGGTTATAAAACTTACACCCATCTGGTGAAGGCTTTTACCGGCGACAAGCAGGTGATCAAAACCGGGATGTGCAAAGAGATTGAACGCTGTCAGGCGGCGATTGAACTGGCGCAGGCCGGGCACAAC CCCCCCCCCCCCCCCC;CCCCCCCCCCCC;CCCCCC;CCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC AS:i:-59 XN:i:0 XM:i:12 XO:i:0 XG:i:0 NM:i:12 MD:Z:23T2C8C5G2C8A2C11T38G5C14G17T3 YT:Z:UU
VH01194:15:AAAWT2VHV:1:1101:31259:2098:N:0:GAACTGAGCG+CGCTCCACGA#0/1__1.441 16 73098__A0A1B7K6J8__M989_00754 328 40 150M * 0 0 CCAGCCGATTTCAGGAAATTAGGCCGTGATGCCGCGGCGACGCTGTTGTCGGTATCTAACGTAACGCTCTGGAATTCCATCGACTATTTCAGCCCCAGCGCCGAGCATAATCCTTTATTGATGACCTGGTCATTGGGCGTGGAAGAACAG CC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCC AS:i:-18 XN:i:0 XM:i:4 XO:i:0 XG:i:0 NM:i:4 MD:Z:2C23C18C16T87 YT:Z:UU
The development of every bioinformatics tool begins with the definition of a new file format, incompatible with all previous formats.
- Charles Darwin
Automa makes Deterministic Finite Automata
fasta_regex = let
header = re"[a-z]+"
seqline = re"[ACGT]+"
record = '>' * header * '\n' * rep1(seqline * '\n')
rep(record)
end
machine = let
header = onexit!(onenter!(re"[a-z]+", :mark_pos), :header)
seqline = onexit!(onenter!(re"[ACGT]+", :mark_pos), :seqline)
record = onexit!(re">" * header * '\n' * rep1(seqline * '\n'), :record)
compile(rep(record))
end
using BioMakie
using GLMakie
using BioStructures
struc = retrievepdb("2vb1") |> Observable
## or
struc = read("2vb1.pdb", BioStructures.PDB) |> Observable
fig = Figure()
plotstruc!(fig, struc; plottype = :ballandstick, gridposition = (1,1), atomcolors = aquacolors)
plotstruc!(fig, struc; plottype = :covalent, gridposition = (1,2))
conda
)$ humann --input some_reads.fastq.gz \
--taxonomic-profile some_profile.tsv --output ./ \
--threads 32 --remove-temp-output --search-mode uniref90 \
--output-basename some
vs
julia> humann(; input="some_reads.fastq.gz",
taxonomic_profile="some_profile.tsv", output="./",
threads=32, remove_temp_output=true, search_mode="uniref90",
output_basename="some")
Installation and deps managed by Conda.jl
SingleCellProjections.jl
: https://live.juliacon.org/talk/NPADF7BioMakie.jl
: https://www.youtube.com/watch?v=-C7Zbh6UTgkQuestions?