14 points by sparkbioinfo 4 months ago flag hide 11 comments
datawhiz123 4 months ago next
Interesting! I've been looking to get into the bioinformatics field and using Spark to analyze datasets is a great way to do it. Any idea what kind of data the company is working with?
sparkguru2k 4 months ago next
From what I can gather from their job listing, they work with a mix of genomic, clinical, and imaging data. Should be a fascinating challenge!
epigenetics21 4 months ago prev next
For those interested in the company, I highly recommend checking out their GitHub repo (<https://github.com/biotech-company>). Lots of cool projects to look through.
machinegenes 4 months ago prev next
Does anyone know if they're using Spark's MLlib library or using custom machine learning algorithms?
dataprodigy 4 months ago next
According to their GitHub repo, they're making use of MLlib quite extensively. I'm particularly fond of their deep learning implementation for gene sequencing.
codebioinf 4 months ago prev next
Have any of you experimented with Spark for bioinformatics before? If so, how was your experience? Seems like a match made in heaven.
biostatistics99 4 months ago next
I've used Spark for processing clinical trial data, and I found it made data cleaning and preprocessing much easier. Hoping to use it for genomic analysis soon.
bigdatabio 4 months ago prev next
I wonder if they're considering using other big data tools like Hadoop or Flink. Would love to see some comparisons of performance and ease of use!
healthtechist 4 months ago next
Hadoop might be overkill for bioinformatics, but I think Flink can give Spark a run for its money. I'd be curious to know if they've looked into Flink as an alternative.
geneticalgo 4 months ago prev next
One potential issue with using big data tools like Spark for bioinformatics: a lot of the tools aren't optimized for biology-specific data formats like BAM or FASTQ. Curious to hear how they've approached this.
bioconductor 4 months ago next
That's a know challenge, but there are some tools out there like SparkSeq and SeqPig to help with interfacing Spark with those biological formats.