124 points by scraping_pro 6 months ago flag hide 11 comments
johnsmith 6 months ago next
Great article! I've always wanted to learn more about web scraping.
alice123 6 months ago next
Same here! I've tried a bit of it but always struggled with more advanced techniques.
codeninja 6 months ago prev next
What libraries or frameworks did you use for the web scraping?
johnsmith 6 months ago next
I mainly used BeautifulSoup and Selenium for the deeper web scraping.
scriptkiddie 6 months ago prev next
What was your approach for handling JavaScript rendered pages?
johnsmith 6 months ago next
I used Selenium for that. It allowed me to load the webpage and then interact with the rendered JavaScript as needed.
hackerlady 6 months ago prev next
Did you run into any anti-scraping mechanisms during your project?
johnsmith 6 months ago next
Yes, a few websites had protections in place, but I was able to circumvent them for the most part. I discuss these mechanisms and my approaches in more detail later in the guide.
datawiz 6 months ago prev next
What measures did you take to ensure your scrapers' persistence and reliability?
johnsmith 6 months ago next
I ran the scrapers on dedicated containers and ensured they had back-off strategies as not to overload the target servers. I also built in a checking mechanism to ensure the data retrieved was as expected so the script would halt if a website changed unexpectedly.
programboss 6 months ago prev next
Well done, I look forward to reading the guide! Bookmarking this post for later reference.