GYK scraper

This code scrapes data from gyakorikerdesek.hu. The data is not too complex, but complex enough to make sense to load into an sqlite database. The loaded data then can be used to do some analytics.

Finally the code is getting into shape. User can decide to fetch all questions for a category, a range of a category by defining a start and end page or fetching a single question based on a provided URL.

Usage

python gyik_scraper.py --database <str> \
            --category <str> \
            --subCategory <str> \
            --startPage <int> \
            --endPage <int> \
            --directQuestion <str> \
            --logFile <str>

where

database: mandatory option, sqlite database file. If not exists, the script creates.
category: mandatory option. Main GyIK category (eg. tudomanyok).
subCategory: optionally the subcategory within the category can be specified to narow down the scope.
startPage: optional. First page of questions to load. Default: 1.
lastpage: optional. The last page of questions to load. Default: last page of questions.
directQuestion: optional. If present only this question will be downloaded. Mostly for testing purposes.
logFile: optional filename for the logs. Default filename: scraper.log

The start page has to be lower then last page. To retrieve all questions for a category these paremeters needs to be omitted.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
db_tools		db_tools
scraper		scraper
README.md		README.md
gyik_scraper.py		gyik_scraper.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GYK scraper

Usage

where

SQLite schema

About

Releases 2

Packages

Contributors 3

Languages

DSuveges/gyik_scraper

Folders and files

Latest commit

History

Repository files navigation

GYK scraper

Usage

where

SQLite schema

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages