Data Collection and visualization for jobs offered under the IT section on Wuzzuf for educational and statistical purposes.
DISCLAIMER: Wuzzuf doesn't offer an open API to obtain the required data for the visualization and as such a web scraper with a delay between requests was used to obtain the data
You can view the whole visualization on Tableau public
If there's a mistake in the filtering in the Visualization or any other feedback please contact me.
You can download previously collected data from this drive. It should be updated every month (or a couple of month) so check back from time to time.
Side Note: if you are going to combine multiple zip files keep in mind that there might be an overlap because a job offer might have lasted for longer than a month.
Don't forget to install the required modules. (unless ofcourse you are going to use docker)
pip install -r requirements.txt
To obtain the data. Run the python module as follows:
Note that because of the delay added between requests the script will take a long time to gather the data. Please be responsible and don't remove the delay.
You can also download some older data in the Download Data Section.
py -m Wuzzuf_DataCollection
There are also command args available which you can view with:
$ py -m Wuzzuf_DataCollection --help
usage: __main__.py [-h] [-l] [-i START_INDEX] [-e END_INDEX] [-c] [-f] [-a]
Gets the list of Job offers on wuzzuf.com for it, gets the details of each offer then generates a CSV file with all the jobs and then archives the output to a zip file Warning: the output file is
overwritten with each run!!
optional arguments:
-h, --help show this help message and exit
-l, --use-existing-Links-file
Use the existing links JSON file (default: False)
-i START_INDEX, --start-index START_INDEX
Start index in links JSON to start getting job info (Inclusive, default: 0)
-e END_INDEX, --end-index END_INDEX
Start index in links JSON to start getting job info (Exclusive)
-c, --skip-create-csv
skip creating CSV files combining data from Job JSONs (default: False)
-f, --skip-get-jobs-info
skip creating JSON files for each job (or jobs within the start and end index if specified) in links JSON file (default: False)
-a, --skip-archive Skip creating an archive for the output and deleting the current output (default: False)
https://github.com/TheDigitalPhoenixX/Wuzzuf-IT-Jobs-Visualization
A docker file and a docker compose are present in the repo. So you can easly start the script using:
docker-compose up
If you are going to deploy the script to run periodicly. (Another reminder that you can download the data in the Download Data Section) Use the following line to add it to the cron tab. Change the parameters to match your machine.
(crontab -l 2>/dev/null; echo "0 0 1 * * cd /home/ec2-user/Wuzzuf-IT-Jobs-Visualization && docker-compose up") | crontab -
- Tableau Public - Data Visualization tool
- Visual Studio Code - Code Editor
- Docker - Containerization
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Mohamed Said Sallam - Main Dev - TheDigitalPhoenixX
- Sameh Amnoun - Main Dev - SamehAmnoun
See also the list of contributors who participated in this project and their work in CONTRIBUTORS.md.
This project is licensed under the MIT License - see the LICENSE file for details