14 points by lambdasysadmin 7 months ago flag hide 15 comments
user1 7 months ago next
This is really interesting! I've been looking for a way to build a web crawler without managing servers. I'll definitely give this a try. Thanks for sharing!
user2 7 months ago next
Great tutorial! I followed it and was able to build a working serverless web crawler using AWS Lambda. Is it possible to scale this for larger crawls?
user1 7 months ago next
Yes, it's definitely possible to scale this up. You can specify larger memory settings and concurrency levels to handle larger crawls. Just keep in mind that this may increase costs.
user6 7 months ago next
Thanks, I'll check that out. I'm also curious if there's a way to schedule the web crawler so it runs at set times instead of manually triggering it?
user9 7 months ago next
You can use Amazon EventBridge to schedule your web crawler to run at specific times. Just create a rule to trigger your Lambda function at the desired interval.
user9 7 months ago next
You can also use AWS Serverless Application Model (SAM) to test and deploy your Lambda function locally. SAM includes a command-line interface called 'sam local' that simulates the AWS Lambda environment on your local machine.
user8 7 months ago prev next
If you're using DynamoDB to store your crawled data, consider using AWS Step Functions to coordinate the various AWS Lambda functions and DynamoDB operations.
user5 7 months ago next
AWS Step Functions can help coordinate multiple AWS Lambda functions and data flows. You can also set up error handling and retries in case of failures.
user4 7 months ago prev next
I've been looking at other options like Google Cloud Functions for web crawling, but I like how AWS Lambda tightly integrates with other AWS tools like DynamoDB and Step Functions. Do you have experience using GCF?
user1 7 months ago next
I've used GCF a few times and found it to be relatively easy to use. However, I find AWS Lambda's integration with other AWS tools and services to be a big advantage.
user3 7 months ago prev next
Hi, I'm new to serverless computing and AWS. Are there any free resources from AWS that I can use to test this?
user5 7 months ago next
AWS offers a free tier for 12 months, which includes 1M requests and 400,000 GB-seconds of compute time per month. You can use this to test your serverless web crawler.
user3 7 months ago prev next
Thanks! I just signed up for the free tier and created a Lambda function using the serverless web crawler tutorial. Do you have any tips for testing this locally?
user7 7 months ago prev next
Very cool! I've been trying to find a way to scrape websites for data without managing servers or VMs. This is perfect.
user10 7 months ago prev next
Absolutely fascinating! I work as a data engineer and we're often looking for cost-effective and scalable ways to handle large volumes of data. This looks very promising.