-
Notifications
You must be signed in to change notification settings - Fork 0
Scraper
Contianed in scraper folder, mainly contained in scraper.py
. The scraper will take a list of package names (whether through a .csv file passed through init, or an actual list passed when scraping), and scrape all of the app metadata for it, which includes all of the fields in app_object.py
. It inserts this data into the mongodb database as decribed in constants.py
, into the APP_METADATA_DB. To run the scraper, first create a scraper object, with possibly passing in a filename containing the package_names:
s = Scraper(input_file='package_names.csv')
Then, run scrape_metadata_for_apps
with the options you wish, for default do:
s.scrape_metadata_for_apps()
The broad purpose of the scraper is to keep updated metadata for each app when a new version is released and also is used by the updater to see when a new version has been released.
To run from the command line, if you want to scrape the packages from a file package_names.csv
run:
python main.py s package_names.csv
Which will automatically scrape all of the package names in the csv file. Also note that currently the scraper will not insert new items into the database if the package name is already in the database, to ensure we don't double count any apps.