ScraperScript is a query language for Web Scraping by Tiago Danin.
ScraperScript is a query language for Web Scraping
Module available through the npm registry. It can be installed using the npm
or yarn
command line tools.
# NPM
npm install scraperscript --global
# Or Using Yarn
yarn global add scraperscript
Use the command scraperscript myfile
or server
Example file.
@https://helloword.site/list
!! A comment ...
- names: html >> body >> div >> h2 @> {number, text, bold} :array
- hasTitle: html >> head >> title == " my string " :boolean
- title: html >> head >> title :string
This return an json:
"error": false,
"errorsMsg": [],
"names": [
{
"number": 0,
"text": "Tiago"
},
{
"number": 0,
"text": "James"
}
],
"hasTitle": true,
"title": "my string"
Place the URL in the first line: @http://myurl.com
Other lines: - key: query :type
PS: Space is important.
Name
Rules:
- key:
Example: - name:
Return type
Rules:
:type
Types:
Example: :string
String
" my string "
NOTE: "my string"
is invalid
Comment
!! my comment in ScrapperScript
Elements
nameOfHtmlElementOne >> nameOfHtmlElementTwo
Map elements [String]
nameOfHtmlElementOne @> nameOfSubHtmlElement
Map elements [Array]
nameOfHtmlElementOne @> [nameOfSubHtmlElement]
Map elements [Object]
nameOfHtmlElementOne @> {nameOfIndex, nameOfData, nameOfSubHtmlElement}
Addition
nameOfHtmlElementOne ++ nameOfHtmlElementTwo
Replace
nameOfHtmlElementOne -- nameOfHtmlElementTwo
Equal comparison or Different
nameOfHtmlElementOne == nameOfHtmlElementTwo
nameOfHtmlElementOne ~= nameOfHtmlElementTwo
OR
nameOfHtmlElementOne || nameOfHtmlElementTwo
To run the test suite, first install the dependencies, then run test
:
# NPM
npm test
# Or Using Yarn
yarn test
Pull requests and stars are always welcome. For bugs and feature requests, please create an issue. List of all contributors.
JavaScript
1