How to work with aScraper

You can choose from two options based on your needs:

  • HTTP REST API
  • Proxy-mode

Let’s consider the HTTP REST API with python v3.
You could find an example in python v3 on our GitHub page (subscribe for future updates).
The showcase will be based on amazon.com.

TIP: use real cookies with cookie and session parameter parameter

Navigate to the page with Google Chrome through the main page of amazon.com. This action will help you to obtain a real cookie for your crawler.

https://www.amazon.com/Ghost-Tsushima-PlayStation-4/dp/B08BSKT43L/

Open Developer mode in Google Chrome, copy the cookie string:

s_fid=55D43447DB60C570-04FAC678B605D512; aws-ubid-main=215-0082816-1412834; regStatus=registered; s_vn=1636380985932%26vn%3D1; s_invisit=true; aws-target-data=%7B%22support%22%3A%221%22%7D; aws-target-visitor-id=1604844986155-864802.37_0; aws-session-id=084-4677677-9244132; aws-session-id-time=1604855620l; aws-analysis-id=084-4677677-9244132; s_dslv_s=Less%20than%201%20day; s_depth=24; s_dslv=1606083241286; s_nr=1606083241648-Repeat; aws-userInfo-signed=eyJ0eXAiOiJKV1MiLCJrZXlSZWdpb24iOiJ1cy1lYXN0LTEiLCJhbGciOiJFUzM4NCIsImtpZCI6IjU5OWJkZmRiLWM4NTUtNGM4NS04Nzc5LTQzZWM4ZGU5ZWZjNCJ9.eyJzdWIiOiIiLCJzaWduaW5UeXBlIjoiUFVCTElDIiwiaXNzIjoiaHR0cDpcL1wvc2lnbmluLmF3cy5hbWF6b24uY29tXC9zaWduaW4iLCJrZXliYXNlIjoiM0M1aUJVV1g1WklKQW5JU2NYSHdFRHQrZG96dzR6bkZvdGE3K0pIZ0xmOD0iLCJhcm4iOiJhcm46YXdzOmlhbTo6MjQ1NjQ5MjEwMjg3OnJvb3QiLCJ1c2VybmFtZSI6ImFzY3JhcGVyIn0.TbsDm4WBy4QITWqYsci0AARJkuas8Ypi06DQU0n5xTMu0LI6P1TJOcXT_dwuLXILvnqSQ_QMGSQ_cYIZeg8OIX8Ws6JH6CBlEOfaRhSqenSSgTA0vmFgNQX4u1cZgvai; aws-userInfo=%7B%22arn%22%3A%22arn%3Aaws%3Aiam%3A%3A245649210287%3Aroot%22%2C%22alias%22%3A%22%22%2C%22username%22%3A%22ascraper%22%2C%22keybase%22%3A%223C5iBUWX5ZIJAnIScXHwEDt%2Bdozw4znFota7%2BJHgLf8%5Cu003d%22%2C%22issuer%22%3A%22http%3A%2F%2Fsignin.aws.amazon.com%2Fsignin%22%2C%22signinType%22%3A%22PUBLIC%22%7D; session-id=133-1430485-8574235; session-id-time=2082787201l; i18n-prefs=USD; sp-cdn="L5Z9:NL"; skin=noskin; ubid-main=131-6601250-7821213; session-token=9LLSGvpWunZTyQWGO6gdfIq6x/crJIVjVU1vEfPcYNx5No2/iJXXLLVHY8nSF4uPjXTc9JF//G3owx8mVI61RxeXepgx8UvQKUjuzJT51XpF4qZO3mQOkX+zvNeVrYcF/hkCJlYxV0KyjmlYz54xT+rV4KnW5ojZaegxXYCzGmBpcq8ON7rQCl0Ms9Vzi2ZFmAgU4wO5ItKQZC5MXL0+bMkGxxjSRXmIp0ig5Vrrkg7mrgoFmAZaclD30MrIVEKB; csm-hit=tb:s-XTPCWX39EC48D8HNBNH1|1606822472130&t:1606822473805&adb:adblk_no


Pass the URL and cookie string to the python script:

python ascraper.py --url https://www.amazon.com/Ghost-Tsushima-PlayStation-4/dp/B08BSKT43L/ --cookie "s_fid=55D43447DB60C570-04FAC678B605D512; aws-ubid-main=215-0082816-1412834;" --session mySessionId --selector h1 --user API_KEY

IMPORTANT: cookie string should be in double quotes to avoid splitting into spaces
IMPORTANT: pass the valid API_KEY to the script

We will select h1 tag from html to avoid huge output

API call was successfully completed
["<h1 id=\"title\" class=\"a-size-large a-spacing-none\"> <span id=\"productTitle\" class=\"a-size-large product-title-word-break\"> Ghost of Tsushima - PlayStation 4 </span> </h1>","<h1 class=\"a-size-medium a-spacing-small secHeader\"> Warranty &amp; Support </h1>","<h1 class=\"a-size-medium a-spacing-small secHeader\"> Feedback </h1>"]


Now you could do a simple request as you passed the session and cookies to the request before. The API will persist this data in 15 minutes.


python ascraper.py --url https://www.amazon.com/Last-Us-Part-II-PlayStation-4/dp/B07DJRFSDF --session mySessionId --selector h1 --user API_KEY

API call was successfully completed
["<h1 id=\"title\" class=\"a-size-large a-spacing-none\"> <span id=\"productTitle\" class=\"a-size-large product-title-word-break\"> The Last of Us Part II - PlayStation 4 </span> </h1>","<h1 class=\"a-size-medium a-spacing-small secHeader\"> Warranty &amp; Support </h1>","<h1 class=\"a-size-medium a-spacing-small secHeader\"> Feedback </h1>"]

INFO: you could request full HTML by omitting the selector parameter

WARNING: in the CURL you need to url-encode url string to avoid violation of HTTP protocol

Other blog posts