89 points by functionalfun 5 months ago flag hide 18 comments
haskell_scraper 5 months ago next
Excited to share my new project, a functional web scraping library in Haskell! I've found that my web scrapers are now more modular, testable and easier to read. #haskell #functionalprogramming #webscraping
functional_coder 5 months ago next
That's really cool! As a haskeller myself, I'm always interested in new libraries for the language #haskellforlife 😄. Quick question though, how do you manage HTTP requests in your library? Is it using something existing or a custom solution?
haskell_scraper 5 months ago prev next
Thanks for asking! I'm using the popular "http-client" package for making HTTP requests. It's well-documented and provides solid functionality needed to get webpages #haskellhttp
web_scraping_lover 5 months ago prev next
Interesting! I've always wanted to learn Haskell for a functional approach. Will definitely check this out. Are there any best practices for error handling in projects like these?
haskell_scraper 5 months ago next
Yes, definitely! Handling errors in FP can be more explicit than in other paradigms. I'd suggest using the "Either" type to model potential errors and include validation/checks where necessary #fpandme
data_lover_91 5 months ago prev next
Ever considered making this usable with 3rd party sites that have strict scraping restrictions/captchas? Some sort of proxy configuration perhaps?
haskell_scraper 5 months ago next
Great idea! I've been brainstorming ways to support proxies and rotating IP addresses.. Stay tuned for updates!
anonymous_user 5 months ago prev next
This looks awesome! I've started learning the basics of Haskell and I'm liking it. Do you suggest any other tools that complement web scraping in FP style?
functional_coder 5 months ago next
There's a fantastic workshop on using FP for data scraping with Haskell here: <workshop-url>. You'll learn a lot about the concepts introduced in this project #resource
haskell_scraper 5 months ago prev next
I'll second that workshop recommendation from @functional_coder. You may also want to consider using tools like "aeson" for JSON parsing to complement your web scraping #aesonFTW
learner 5 months ago prev next
What's the advantage of using a functional approach for web scraping compared to the traditional OO or procedural
haskell_scraper 5 months ago next
In FP, we encourage composition over inheritance and chainable function calls. This leads to more modular and reusable code that is easier to test, and more declarative code that reads
curiousgeorge 5 months ago prev next
I know there is some hype around web scraping using advanced techniques like deep learning. How does your approach compare?
haskell_scraper 5 months ago next
Using FP allows better separation of responsibilities and simplified testing. Advanced techniques like ML can sometimes be an overkill for web scraping tasks and have larger dependencies #yagni
evaluator 5 months ago prev next
I'd love to hear your how you tackle rate limiting for certain websites.
haskell_scraper 5 months ago next
I keep track of requests and adjust wait times based on rate limits set by particular sites (or measured from scraping attempts). This is crucial for avoiding IP bans and staying within their terms
keen_developer 5 months ago prev next
Is there support to capture JavaScript rendered content in your library?
haskell_scraper 5 months ago next
Currently, I focus on raw HTML content for simplicity and lower dependencies. However, I'm looking into popular headless browsers like Puppeteer to support JS rendering in the future!