HLTV Scraper

Working Update

HLTV Scraper

This is a multi-threaded Python scraper designed to pull data from HLTV.org and tabulate it into a series of CSV files. It is written in pure Python, so it should run on any system that can run Python 3. It is not compatible with Python 2, so you may need to install the latest Python release from here.

Installation

Since this is written in pure Python, there are no dependencies to install. Simply clone the repository or download the zip file, then cd to the directory and run python3 start.py. There is demonstration here.

Getting New Matches

This works by scraping several pieces of data from the HLTV webpages. First, it will paginate through the match results page and determine which Match IDs are not yet in the database. If there are new IDs to tabulate, it will append them to the matchIDs.csv file. These new matches are stored in an array called matchesToCheck.

Matching Matches to Events

Once these have been added, it compares matchIDs.csv to joinMatchEvent.csv file. getMatchEvents.py will parse matchesToCheck to find their respective Event IDs and append them to the joinMatchEvent.csv file.

Getting New Events

From there, getEventNames.py compares eventIDs.csv to joinMatchEvent.csv and scrapes the respective event results page to determine various data about the event. The data is then appended to eventIDs.csv.

Getting Match Results

Once the new events have been accounted for, the script takes matchesToCheck and sends them to getMatchInfo.py to scrape the necessary information.

Handling Multiple Maps

Since this returns multidimensional arrays for matches with more than one map, the script calls fixArray() thrice to remove any extra dimensions. The method turns an array like this:

[[1, 2, 3], [3, 4, 5], [['a', 'b', 'c'], ['c', 'd', 'e']], [5, 6, 7]]

To an array like this:

[[1, 2, 3], [3, 4, 5], [5, 6, 7], ['a', 'b', 'c'], ['c', 'd', 'e']]

After that, it tabulates the new information to matchResults.csv.

Getting Match Lineups

Next, the script parses the same new matches stored in matchesToCheck and find the respective team lineups and tabulates the new information to matchLineups.csv.

Getting player stats

Each match has player stats for each map. The script looks for these statistics using the matches stored in matchesToCheck and appends the new data to playerStats.csv.

Updating Players and Teams

Each player and team on HLTV has a unique identification number that increases as new players are added to the database. To find new players and teams, we get the maximum identifier value form the respective .csv file and iterate over it using getIterableItems. From there the relevant pages are scraped and tabulated to players.csv and teams.csv.

Starting Over

If you made an entry of a more recent event and want to go beyond that, remove clear the csv : matchIDs.csv to restart.(do not remove first row, ID and Title)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
__pycache__		__pycache__
csv		csv
getMatchIDs.py		getMatchIDs.py
helper.py		helper.py
htmls.py		htmls.py
readme.md		readme.md
requirements.txt		requirements.txt
scraper.py		scraper.py
start.py		start.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Working Update

HLTV Scraper

Installation

Getting New Matches

Matching Matches to Events

Getting New Events

Getting Match Results

Handling Multiple Maps

Getting Match Lineups

Getting player stats

Updating Players and Teams

Starting Over

About

Releases

Packages

Languages

pulkitgupta2k/HLTV-Scraper

Folders and files

Latest commit

History

Repository files navigation

Working Update

HLTV Scraper

Installation

Getting New Matches

Matching Matches to Events

Getting New Events

Getting Match Results

Handling Multiple Maps

Getting Match Lineups

Getting player stats

Updating Players and Teams

Starting Over

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages