About

ScraperScript is a query language for Web Scraping by Tiago Danin.


ScraperScript

Travis Downloads Node Version XO code style

ScraperScript is a query language for Web Scraping

Installation

Module available through the npm registry. It can be installed using the npm or yarn command line tools.

# NPM
npm install scraperscript --global
# Or Using Yarn
yarn global add scraperscript

Documentation

Use the command scraperscript myfile or server

Example file.

@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string

This return an json:

"error": false,
"errorsMsg": [],
"names": [
	{
		"number": 0,
		"text": "Tiago"
	},
	{
		"number": 0,
		"text": "James"
	}
],
"hasTitle": true,
"title": "my string"

Syntax

Place the URL in the first line: @http://myurl.com

Other lines: - key: query :type

PS: Space is important.

Key

Name

Rules:

Example: - name:

Type

Return type

Rules:

Types:

Example: :string

Query

String

" my string "

NOTE: "my string" is invalid

Comment

!! my comment in ScrapperScript

Elements

nameOfHtmlElementOne >> nameOfHtmlElementTwo

Map elements [String]

nameOfHtmlElementOne @> nameOfSubHtmlElement

Map elements [Array]

nameOfHtmlElementOne @> [nameOfSubHtmlElement]

Map elements [Object]

nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}

Addition

nameOfHtmlElementOne ++ nameOfHtmlElementTwo

Replace

nameOfHtmlElementOne -- nameOfHtmlElementTwo

Equal comparison or Different

nameOfHtmlElementOne == nameOfHtmlElementTwo

nameOfHtmlElementOne ~= nameOfHtmlElementTwo

OR

nameOfHtmlElementOne || nameOfHtmlElementTwo

Tests

To run the test suite, first install the dependencies, then run test:

# NPM
npm test
# Or Using Yarn
yarn test

Dependencies

Dev Dependencies

Contributors

Pull requests and stars are always welcome. For bugs and feature requests, please create an issue. List of all contributors.

License

MIT © Tiago Danin


JavaScript

1