Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
IonicaBizau committed Jan 1, 2020
1 parent 6c8ccdf commit 9928671
Show file tree
Hide file tree
Showing 6 changed files with 80 additions and 28 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,7 @@
*~
*.log
node_modules
*.env
.DS_Store
package-lock.json
.bloggify/*
2 changes: 2 additions & 0 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ You can see below the API reference of this module.
A scraping module for humans.

#### Params

- **String|Object** `url`: The page url or request options.
- **Object** `opts`: The options passed to `scrapeHTML` method.
- **Function** `cb`: The callback function.
Expand All @@ -21,6 +22,7 @@ A scraping module for humans.
Scrapes the data in the provided element.

#### Params

- **Cheerio** `$`: The input element.
- **Object** `opts`: An object containing the scraping information.
If you want to scrape a list, you have to use the `listItem` selector:
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2016-19 Ionică Bizău <[email protected]> (https://ionicabizau.net)
Copyright (c) 2016-20 Ionică Bizău <[email protected]> (https://ionicabizau.net)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
95 changes: 70 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,52 +1,60 @@
<!-- Please do not edit this file. Edit the `blah` field in the `package.json` instead. If in doubt, open an issue. -->


[![scrape-it](https://i.imgur.com/j3Z0rbN.png)](#)

# scrape-it

[![Support me on Patreon][badge_patreon]][patreon] [![Buy me a book][badge_amazon]][amazon] [![PayPal][badge_paypal_donate]][paypal-donations] [![Ask me anything](https://img.shields.io/badge/ask%20me-anything-1abc9c.svg)](https://github.com/IonicaBizau/ama) [![Travis](https://img.shields.io/travis/IonicaBizau/scrape-it.svg)](https://travis-ci.org/IonicaBizau/scrape-it/) [![Version](https://img.shields.io/npm/v/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Downloads](https://img.shields.io/npm/dt/scrape-it.svg)](https://www.npmjs.com/package/scrape-it)

> A Node.js scraper for humans.
Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).

## :cloud: Installation

```sh
# Using npm
npm install --save scrape-it

# Using yarn
yarn add scrape-it
```


:bulb: **ProTip**: You can install the [cli version of this module](https://github.com/IonicaBizau/scrape-it-cli) by running `npm install --global scrape-it-cli` (or `yarn global add scrape-it-cli`).

## FAQ


![scrape-it](https://i.imgur.com/j3Z0rbN.png)




# scrape-it

A Node.js scraper for humans.

> Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).
Here are some frequent questions and their answers.

### 1. How to parse scrape pages?

## Installation

```sh
$ npm i scrape-it
```
`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:


1. **The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.
2. **The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response
3. **The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page.

### 2. Crawling


There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files.

### 3. Local files

## Example

Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`.


## :clipboard: Example



```js
"use strict"

const scrapeIt = require("scrape-it")

// Promise interface
Expand Down Expand Up @@ -144,24 +152,32 @@ scrapeIt("https://ionicabizau.net", {
// title: 'Ionică Bizău',
// desc: 'Web Developer, Linux geek and Musician',
// avatar: '/images/logo.png' }

```





## :question: Get Help

There are few ways to get help:

## Documentation
1. Please [post questions on Stack Overflow](https://stackoverflow.com/questions/ask). You can open issues with questions, as long you add a link to your Stack Overflow question.
2. For bug reports and feature requests, open issues. :bug:

3. For direct and quick help, you can [use Codementor](https://www.codementor.io/johnnyb). :rocket:




## :memo: Documentation


### `scrapeIt(url, opts, cb)`
A scraping module for humans.

#### Params

- **String|Object** `url`: The page url or request options.
- **Object** `opts`: The options passed to `scrapeHTML` method.
- **Function** `cb`: The callback function.
Expand All @@ -177,6 +193,7 @@ A scraping module for humans.
Scrapes the data in the provided element.

#### Params

- **Cheerio** `$`: The input element.
- **Object** `opts`: An object containing the scraping information.
If you want to scrape a list, you have to use the `listItem` selector:
Expand Down Expand Up @@ -249,18 +266,46 @@ Scrapes the data in the provided element.



## :yum: How to contribute
Have an idea? Found a bug? See [how to contribute][contributing].


## :sparkling_heart: Support my projects

## How to contribute
Have an idea? Found a bug? See [how to contribute][contributing].
I open-source almost everything I can, and I try to reply to everyone needing help using these projects. Obviously,
this takes time. You can integrate and use these projects in your applications *for free*! You can even change the source code and redistribute (even resell it).

However, if you get some profit from this or just want to encourage me to continue creating stuff, there are few ways you can do it:


- Starring and sharing the projects you like :rocket:
- [![Buy me a book][badge_amazon]][amazon]—I love books! I will remember you after years if you buy me one. :grin: :book:
- [![PayPal][badge_paypal]][paypal-donations]—You can make one-time donations via PayPal. I'll probably buy a ~~coffee~~ tea. :tea:
- [![Support me on Patreon][badge_patreon]][patreon]—Set up a recurring monthly donation and you will get interesting news about what I'm doing (things that I don't share with everyone).
- **Bitcoin**—You can send me bitcoins at this address (or scanning the code below): `1P9BRsmazNQcuyTxEqveUsnf5CERdq35V6`

![](https://i.imgur.com/z6OQI95.png)


Thanks! :heart:



## :scroll: License

[MIT][license] © [Ionică Bizău][website]


## License
See the [LICENSE][license] file.
[badge_patreon]: https://ionicabizau.github.io/badges/patreon.svg
[badge_amazon]: https://ionicabizau.github.io/badges/amazon.svg
[badge_paypal]: https://ionicabizau.github.io/badges/paypal.svg
[badge_paypal_donate]: https://ionicabizau.github.io/badges/paypal_donate.svg

[patreon]: https://www.patreon.com/ionicabizau
[amazon]: http://amzn.eu/hRo9sIZ
[paypal-donations]: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=RVXDDLKKLQRJW

[license]: /LICENSE
[license]: http://showalicense.com/?fullname=Ionic%C4%83%20Biz%C4%83u%20%3Cbizauionica%40gmail.com%3E%20(https%3A%2F%2Fionicabizau.net)&year=2016#license-mit
[website]: https://ionicabizau.net
[contributing]: /CONTRIBUTING.md
[docs]: /DOCUMENTATION.md
2 changes: 1 addition & 1 deletion package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
"blah": {
"h_img": "https://i.imgur.com/j3Z0rbN.png",
"cli": "scrape-it-cli",
"description": "Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).",
"installation": [
{
"h2": "FAQ"
Expand Down Expand Up @@ -97,4 +98,4 @@
"bloggify.json",
"bloggify/"
]
}
}

0 comments on commit 9928671

Please sign in to comment.