Scraping Amazon Product Pages (PDP) Without Writing Code in 3 Steps
I will walk you through scraping Amazon product details pages without setting up servers or writing a single piece of code using Integromat and Scrapezone.
Join the DZone community and get the full member experience.
Join For FreeNowadays, when eCommerce is booming, scraping eCommerce websites for “alternative data” becomes essential to stay afloat in this competitive game. Some apply AI and text analysis in this data to extract consumer insights and competitive intelligence, while others rely on this data to optimize their pricing.
In most cases, web scraping requires setting up a headless browser like Puppeteer or Selenium and configuring it to fetch the right content from the required pages. In one of my previous articles, I covered several puppeteer tricks to avoid detection. This time I will try to cover how to collect e-commerce data without coding.
In this tutorial, I will walk you through scraping Amazon product details pages without setting up servers or writing a single piece of code using Integromat and Scrapezone.
To cover this basic exercise, this little program’s purpose is to scrape a list of Amazon product URLs daily and send the results to an email address of our choosing.
Step 1: Create Accounts
There are two fundamental tools for this process, Integromat, and Scrapezone; both are free to signup and can run the first few preliminary jobs for free.
Start with creating a free Integromat account here.
The next step is to create a free Scrapezone account. Register a new account here and copy your username/password details that appear in the API Information tab on the dashboard’s Home page.
Step 2: Create the Scrape Receiving Scenario
Now that you have an account in both tools, the next step would be to create a scenario in Integromat. Login to Ingegromat and select ‘Create a new scenario’:
Search for webhook, and click ‘Continue.’
Click on ‘Custom webhook,’ add a new webhook, and copy the hook address to the clipboard.
This webhook will receive the scrape results from ScrapeZone.
To allow Integromat to define the incoming data structure, let’s send a sample scrape request using this webhook.
Open Terminal and type the following (Make sure you paste the webhook URL in the “callback_url” field:
curl --user username:MyScrapingPassword \
--header "Content-Type: application/json" \
--request POST \
--data '{"query":["https://amazon.com/dp/B08J65DST5","https://amazon.com/dp/B08J65DST5"],"scraper_name":"amazon_product_display", “callback_url”: <Paste the webhook url here>”}' \
https://api.scrapezone.com/scrape
Wait for 30-60 seconds for the results to be sent back to Integromat. The status should change to “Successfully determined.”
Now let’s send the results to our email address.
Click on ‘Add another module.’
And select Email -> Send an Email.
You will be required to configure your email address as a connection; this is very straightforward.
Select your preferred email address for ‘To,’ and “Scrape Results” as the subject.
For content, select the ‘parsed_results_csv’ - to receive the CSV file with the scrape results.
If you prefer, select the parsed_results_json to receive the JSON results file.
Click ‘Ok,’ rename the scenario to ‘Scrape Results,’ and click ‘Save.’
Now you can test the scenario by re-sending the curl request.
Step 3: Create the Scheduled Scraping Scenario
Since we want to create a daily scrape, we will create a scenario that sends an HTTP request to ScrapeZone to initiate a scraping task.
Select ‘HTTP’ from the menu and then ‘Make a Basic Auth request.’
For the credentials, click ‘Add’ and type your scraping username and password from Scrapzone Dashboard.
Fill in the following details:
URL: https://api.scrapezone.com/scrape
Body Type: Raw
Content Type: JSON(application/json)
Request Content:
xxxxxxxxxx
{"query":["https://amazon.com/dp/B08J65DST5","https://amazon.com/dp/B08J65DST5"],"scraper_name":"amazon_product_display", “callback_url”: <Paste the webhook url here>”}
For scheduling the task, click on the clock image and select your preferred scheduling.
Getting the Data and Conclusion
That’s all! To test everything, make sure that both scenarios are turned on, go to the request sending scenario and click the blue ‘Play’ button.
You can see the scrape progress in Scrapezone Dashboard.
As soon as the scrape is done, the results are sent to your email address or available to download from their dashboard.
This guide is an example of a pretty straightforward web scraping scenario, but it can be easily applied to more complex procedures and periodic crawls. I hope it will give you some ideas of what can be done with this combination of these tools.
Opinions expressed by DZone contributors are their own.
Comments