Skip to content

Web scraper that take a TAX Code and return the PEC address of the company

License

Notifications You must be signed in to change notification settings

riccardopaltrinieri/iniscrapec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iniscrapec

iniscrapec is a simple scraper project that take a TAX Code of a company and return the PEC address of it

Tech

iniscrapec uses a number of open source projects to work properly:

  • [pip]==20.2.2
  • [beautifulsoup4]~=4.9.1
  • [mechanize]~=0.4.5
  • [pymongo]~=3.11.0
  • [dnspython]~=2.0.0
  • [python_dotenv]~=0.14.0
  • [setuptools]~=50.3.0

And iniscrapec itself is open source with a public repository on GitHub.

Also it uses a third part service to solve the reCaptcha "I am not a robot"

Installation

iniscrapec requires python 3.7 to run.

How to get it from git

$ git clone https://github.com/riccardopaltrinieri/iniscrapec.git

How to get it from pip

$ pip install iniscrapec

After installation

You need to fill the environment variables in the .env file:

CAP_KEY = "" # The API key given from the site 2capthca.com
DB_USER = "" # The user of the Mongo DB 
DB_PWD = "" # The password of the Mongo DB
DATA_SITEKEY = "" # The captcha code as written in the step 2 of the link below
				  # now it's 6Lf-0UAUAAAAAHdt6Gc54MkKXzoyV1iMzX7L55V9 but it could change
URL = "https://www.inipec.gov.it/cerca-pec/-/pecs/companies" #the gov website where to search the pec
TAX_EXAMPLE = "" # Variable used for testing and debugging

link on how to use 2captcha

How to run it with a simple [tkinter] gui

(if installed with pip)
$ python3 -m iniscrapec

or

$ cd path\of\repo\iniscrapec
$ python3 __main__.py

You can also use only the scraper code with

$ cd path\iniscrapec\modules
$ python3 scraper.py

License

MIT

About

Web scraper that take a TAX Code and return the PEC address of the company

Resources

License

Stars

Watchers

Forks

Packages

No packages published