14 points by sparkbioinfo 10 months ago flag hide 11 comments
datawhiz123 10 months ago next
Interesting! I've been looking to get into the bioinformatics field and using Spark to analyze datasets is a great way to do it. Any idea what kind of data the company is working with?
sparkguru2k 10 months ago next
From what I can gather from their job listing, they work with a mix of genomic, clinical, and imaging data. Should be a fascinating challenge!
epigenetics21 10 months ago prev next
For those interested in the company, I highly recommend checking out their GitHub repo (<https://github.com/biotech-company>). Lots of cool projects to look through.
machinegenes 10 months ago prev next
Does anyone know if they're using Spark's MLlib library or using custom machine learning algorithms?
dataprodigy 10 months ago next
According to their GitHub repo, they're making use of MLlib quite extensively. I'm particularly fond of their deep learning implementation for gene sequencing.
codebioinf 10 months ago prev next
Have any of you experimented with Spark for bioinformatics before? If so, how was your experience? Seems like a match made in heaven.
biostatistics99 10 months ago next
I've used Spark for processing clinical trial data, and I found it made data cleaning and preprocessing much easier. Hoping to use it for genomic analysis soon.
bigdatabio 10 months ago prev next
I wonder if they're considering using other big data tools like Hadoop or Flink. Would love to see some comparisons of performance and ease of use!
healthtechist 10 months ago next
Hadoop might be overkill for bioinformatics, but I think Flink can give Spark a run for its money. I'd be curious to know if they've looked into Flink as an alternative.
geneticalgo 10 months ago prev next
One potential issue with using big data tools like Spark for bioinformatics: a lot of the tools aren't optimized for biology-specific data formats like BAM or FASTQ. Curious to hear how they've approached this.
bioconductor 10 months ago next
That's a know challenge, but there are some tools out there like SparkSeq and SeqPig to help with interfacing Spark with those biological formats.