Documentation

Introduction

αScraper is a scraping service API. You can choose from two options based on your needs:

Some basics about our service:

Using our services

Basically we have one endpoint with two request types

http://api.ascraper.com/crawl
GET request
Returns result in JSON format
To return result in plain HTML, set GET parameter format=html
Supports just 3 GET parameters: userId=API_KEY, url and selector. Simple and fast.
http://api.ascraper.com/crawl
POST-request
Supports custom headers and cookies, sessions, js-rendering. Accepts and returns result in JSON

Example

JSON

  curl "http://api.ascraper.com/crawl?userId=API_KEY&url=https://amazon.com"
                

Result

{
   "status":{
      "code":"OK"
   },
   "cookies":[

   ],
   "headers":{
      "Server":"gunicorn/19.9.0",
      "Access-Control-Allow-Origin":"*",
      "Access-Control-Allow-Credentials":"true",
      "Connection":"keep-alive",
      "Content-Length":"33",
      "Date":"Sat, 31 Oct 2020 16:19:39 GMT",
      "Content-Type":"application/json"
   },
   "html_source":"{\n  \"origin\": \"185.233.83.124\"\n}\n"
}
            
If you want to return only one HTML selector, you can add a jquery-style selector &selector='SELECTOR' to the query string. Don't forget to apply the urlencode css selector.

    
  curl "http://api.ascraper.com/crawl?userId=API_KEY&selector=title&url=https://google.com"
                

Result

{
   "source":"[\"<title>Google</title>\"]",
   "status":{
      "code":"OK"
   },
   "cookies":[
      {
        ...
      }
   ],
   "headers":{
        ...
   }
}
            

Example

If you need plain HTML

Use format=html parameter


    
  curl "http://api.ascraper.com/crawl?userId=API_KEY&url=https://google.com&format=html"
                

Result

                
<!doctype html>
<html itemscope="" itemtype="http://schema.org/WebPage" lang="ru">
   <head>
      <meta charset="UTF-8">
      <meta content="origin" name="referrer">
      <link href="/searchdomaincheck?format=opensearch" title="Поиск в Google" rel="search" type="application/opensearchdescription+xml">
      <link href="/manifest?pwa=webhp" crossorigin="use-credentials" rel="manifest">
              ...
            

Responses

We'll send you the following answer options:

Custom Headers

To make a request with custom headers or custom cookies you have to use special endpoint:

http://api.ascraper.com/crawl

Use POST request

Example


    
  curl -X POST -H "Content-Type: application/json" --data '{"url": "http://httpbin.org/headers","userId": "API_KEY","headers": [{"name" : "name", "value" : "value"}]}' 'http://api.ascraper.com/crawl'
                

Result

{
   "status":{
      "code":"OK"
   },
   "sessionId":"21798b55-153d-4ae8-b785-271c40f761ca",
   "cookies":[

   ],
   "headers":{
      "Server":"gunicorn/19.9.0",
      "Access-Control-Allow-Origin":"*",
      "Access-Control-Allow-Credentials":"true",
      "Connection":"keep-alive",
      "Content-Length":"485",
      "Date":"Sat, 31 Oct 2020 19:45:20 GMT",
      "Content-Type":"application/json"
   },
   "html_source":"{\n  \"headers\": {\n    \"Accept\": \"*/*\", \n    \"Host\": \"httpbin.org\", \n    \"Name\": \"value\", \n    \"User-Agent\": \"Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 OPR/43.0.2442.1144\", \n    \"X-Amzn-Trace-Id\": \"Root=1-5f9dbed0-597dd69e27590a322779a228\", \n    \"X-B3-Parentspanid\": \"c15cd9bc743da160\", \n    \"X-B3-Sampled\": \"1\", \n    \"X-B3-Spanid\": \"a7c628a49b2b6db0\", \n    \"X-B3-Traceid\": \"5f9dbecf7a33412fae164ec446116fd9\"\n  }\n}\n"
}
            

Custom Cookies

To make a request with custom headers or custom cookies you have to use special endpoint:

http://api.ascraper.com/crawl

Use POST request

Example


    
  curl -X POST -H "Content-Type: application/json" --data '{"url": "http://httpbin.org/cookies","userId": "API_KEY","cookies": [{"name" : "name", "value" : "value"}]}' 'http://api.ascraper.com/crawl'
                

Result

{
   "status":{
      "code":"OK"
   },
   "sessionId":"31d5efdf-069e-4bcc-98de-0e99eea024af",
   "cookies":[
      {
         "domain":".httpbin.org",
         "hostOnly":false,
         "httpOnly":false,
         "name":"name",
         "path":"/",
         "sameSite":"None",
         "secure":false,
         "session":false,
         "storeId":false,
         "value":"value",
         "id":null
      }
   ],
   "headers":{
      "Server":"gunicorn/19.9.0",
      "Access-Control-Allow-Origin":"*",
      "Access-Control-Allow-Credentials":"true",
      "Connection":"keep-alive",
      "Content-Length":"43",
      "Date":"Sat, 31 Oct 2020 19:51:44 GMT",
      "Content-Type":"application/json"
   },
   "html_source":"{\n  \"cookies\": {\n    \"name\": \"value\"\n  }\n}\n"
}
            

Sessions

By default we rotate the IP with every request. But if you need to reuse an IP or cookie, simply use the &session_id= flag (e.g. session_id=123). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session id). Sessions expire 15 minutes after the last usage.

Example


    
  curl -X POST -H "Content-Type: application/json" --data '{"url": "http://httpbin.org/cookies","userId": "API_KEY","cookies": [{"name" : "name", "value" : "value"}]}' 'http://api.ascraper.com/crawl'
  curl -X POST -H "Content-Type: application/json" --data '{"url": "http://httpbin.org/cookies","userId": "API_KEY","sessionId": "31d5efdf-069e-4bcc-98de-0e99eea024af"}' 'http://api.ascraper.com/crawl'
                

Result

{
   "status":{
      "code":"OK"
   },
   "sessionId":"31d5efdf-069e-4bcc-98de-0e99eea024af"
}
            

Chrome Cluster

If you need a real browser to get page contents or javascript rendering, use the render=true parameter. By default we disable all css files and images

Example


    
  curl -X POST -H "Content-Type: application/json" --data '{"url": "http://httpbin.org/ip","userId": "API_KEY","render" : true}' 'http://api.ascraper.com/crawl'
                    
  curl "http://api.ascraper.com/crawl?userId=API_KEY&selector=title&url=https://google.com&render=true"
                

Result

{
   "sessionId":"3caf3d06-cce2-4ba2-a7e0-03db4b65f8d4",
   "cookies":[

   ],
   "headers":{

   },
   "html_source":"<pre style=\"word-wrap: break-word; white-space: pre-wrap;\">{\n  \"origin\": \"172.19.0.1, 185.233.80.89\"\n}\n</pre>"
}
            

Proxy Mode

Also you can send all your requests to a proxy-frontend. The proxy mode will pass all your requests through our service. So you'll get all the benefits, such as ip rotation, auto retries, and others. Just like in basic mode, we'll handle requests in the same way:
200, 404 requests - successful requests
500 - unsuccessful requests
429 - out of limits

You can use the proxy for binary content scraping, we handle it like normal traffic.

Also you can pass all service parameters to a proxy:
- render
- session_id
- headers
- cookies

Set the parameters like this:
ascraper;render=true;session_id=session@API_KEY:proxy.ascraper.com:8080

Any headers you set for your proxy requests will be automatically sent to the site you are scraping.
To properly pass your requests through the API your code must be configured to not verify SSL certificates.