This project creates and stores a list of Community Amateur Sports Clubs (CASC), a type of non profit organisation registered in the UK with HMRC.
It takes the list of clubs published by HMRC and turns it into a consistently formated CSV or JSON data. It also adds a unique identifier for each club, which is based on the name and address of the club.
The reason for adding a unique identifier is to allow CASCs to be listed in the org ID scheme for Organisation Identifiers.
Because HMRC do not make the identifiers for these organisations public, we have to create one. This identifier is not ideal as it is based on the name and postcode of the club, so will change when either of those change.
This issue contains discussion about the reasons for creating this repository
The identifier is created in cascs/fetch_cascs.py. The function does the following:
- Concatenate the name and postcode of the record (name first). If either is null then replace with the string
None
. Both strings must be UTF-8 encoded. - Take the MD5 hash of the concatenated string.
- Get the hexdigest of the hash.
- The first 8 characters of the hexdigest becomes the ID for the record
- Add
GB-CASC-
as a prefix to create an OrgID-compliant identifier.
In this package this is implented using python's hashlib library.
To create a CSV file with the contents of the HMRC list of registered cascs run the following command:
python cascs /path/to/file.csv
To create a JSON file with the same data run:
python cascs /path/to/file.json
To create both files run:
python cascs /path/to/file.csv /path/to/file.json
To check for name matches (where a CASC has changed address but the name is the same, you can run the following command):
python cascs --name-match name_match.csv cascs.csv cascs.csv cascs.json
This will crate a file called "name_match.csv" which contains IDs with the same name. You can verify these and then add any matches to cascs_id_lookup.csv
where they will be incorporated into the data rather than creating new IDs.
The full process to update the CSV and JSON files in this repository should be something like:
python cascs cascs.csv cascs.csv cascs.json
git add cascs.csv
git commit -m 'Add new cascs'
git push origin master
The file casc_company_house.csv contains a list of CASCs that also appear to be registered with Companies House (based on matching the name). This file is manually created by matching the name of the CASC with the name on Companies House (replacing 'Ltd' with 'Limited' where appropriate, and ignoring the case).