Wednesday 25 September 2013

A simple way to turn a website into JSON

Recently, while surfing the web I stumbled upon an simple web scraping service named Web Scrape Master. It is a kind of RESTful web service that extracts data from a specified web site and returns it to you in JSON format.
How it works

Though I don’t know what this service may be useful for, I still like its simplicity: all you need to do is to make an HTTP GET request, passing all necessary parameters in the query string:
http://webscrapemaster.com/api/?url={url}&xpath={xpath}&attr={attr}&callback={callback}

    url  - the URL of the website you want to scrape
    xpath – xpath determining the data you need to extract
    attr - attribute the name you need to get the value of (optional)
    callback - JSON callback function (optional)

For example, for the following request to our testing ground:

http://webscrapemaster.com/api/?url=http://testing-ground.extract-web-data.com/blocks&xpath=//div[@id=case1]/div[1]/span[1]/div

You will get the following response:

[{"text":"<div class='name'>Dell Latitude D610-1.73 Laptop Wireless Computer</div>","attrs":{"class":"name"}}]
Visual Web Scraper

Also, this service offers a special visual tool for building such requests. All you need to do is to enter the URL of the website and click to the element you need to scrape:
Visual Web Scraper
Conclusion

Though I understand that the developer of this service is attempting to create a simple web scraping service, it is still hard to imagine where it can be useful. The task that the service does can be easily accomplished by means of any language.

Probably if you already have software receiving JSON from the web, and you want to feed it with data from some website, then you may find this service useful. The other possible application is to hide your IP when you do web scraping. If you have other ideas, it would be great if you shared them with us.



Source: http://extract-web-data.com/a-simple-way-to-turn-a-website-into-json/

No comments:

Post a Comment