GitHub - Kibrael/hmda-test-files: This repository contains code used to generate synthetic LAR files. These files will be used to support the development of the 2018 HMDA data submission platform.

Purpose and Scope:

This repository contains code used to generate synthetic LAR and TS files. The test files repository has file creation for the 2018 and 2019 collection years.

Two types of files can be created: clean files and test files. Clean files will pass all edit checks in the FIG for the relevant year, while test files will fail the edit in the file name. Test files may also fail some additional edits, this is known behavior.

Structure

Each year listed in the parent directory contains its own codebase for creating test files. Each year relates to a HMDA collection year. Test files are year specific due to changes in the HMDA FIG.

2018/python and 2019/python contain the notebooks and python scripts used to generate both clean and test files.
2018/schemas and 2019/schemas contain schemas for the LAR and TS in JSON format. These schemas are taken from the 2018 HMDA FIG and the 2019 HMDA FIG
2018/dependencies and 2019/dependencies contain supplemental data files used in the generation of clean and test files. - Relevant FFIEC Census data, see this repo for more information - A file containing a list of US ZIP codes

Dependencies

Python 3.5 or greater
Jupyter Notebooks: pip3 install jupyter
Pandas: pip3 install pandas
Other required Python libraries can be installed with pip3 install -r requirements.txt

Generating Clean Files

These files are used as the base for generating files that will fail edits. Running the following scripts will create the edits_files directory and a data file that will pass the HMDA edit checks. The file will have a number of rows set in a YAML clean file configuration for each directory. Other variables, such as data ranges can also be set in the configuration files.

Configuration values for clean files can be changed using the:

Additional configuration options are available in the configuration folders by year:

For 2019, 2020, and 2021:

Navigate to the <year>/python directory
Run python3 generate_clean_files.py
The clean test file will be created with the following path: {year}/edits_files/{bank name}/clean_files/{Bank Name}_clean_{row count}.txt.

For 2018:

Navigate to the 2018/python directory
Run python3 generate_2018_clean_files.py
The clean test file will be created in a new edits_files directory under 2018/edits_files/clean_files/{Bank Name}/ with the filename clean_file_{Number or Rows}_{Bank Name}.txt

Generating Test Files

The generation of edit test files requires a clean data file to be present.The steps above outline the process to create the clean data files.

Test files will be created using a clean file of the length specifid in the file_length value fo the clean file configuration.

Test files will be written to sub directories based on the type of edit they fail: edits_files/{bank name}/test_files/{edit type}/{bank name}_{edit name}_{row count}.txt

Existing test files of the same length will be overwritten. These filepaths can be changed in test filepaths configuration.

To create test files for 2019, 2020, and 2021:

Navigate to the <year>/python directory.
Run python3 generate_error_files.py

To create test files for 2018:

Navigate to the 2018/python directory.
Run python3 generate_2018_error_files.py

The error files for testing syntax, validity, and quality edit test files will be created in the following diretories:
- Syntax: {year}/edits_files/test_files/{Bank Name}/syntax
- Validity: {year}/edits_files/test_files/{Bank Name}/validity
- Quality: {year}/edits_files/test_files/{Bank Name}/quality
- Quality (Adjusted to pass syntax and validity): {year}/edits_files/test_files/{Bank Name}/quality_pass_s_v

Generating Large Files

Due to code design and the edit rules for the LAR data generating synthetic data files of large size was time prohibitive. The large file generation script takes a different approach by using a clean file base and copying rows until the desired file size is created.

To generate large files for 2019, 2020, and 2021:

Navigate to the <year>/python directory
Run python3 generate_large_files.py

To set the large file size for 2019 edit the large_file_write_length value in the clean configuration. To set the base file used to create large files edit the large_file_base_length value in the clean configuration.
To set the large file size for 2020, and 2021, edit the large_file_write_length value in the 2020 large configuration, or 2021 large configuration. To set the base file used to create large files edit the large_file_base_length value in the 2020 large configuration, or 2021 large configuration.
- For 2020 and 2021, large_file_base_length value in large_file_config.yaml should correspond with file_length value in bank1_config.yaml, as the generated clean file being the base for generating the large file, and the filenames corresponds with record numbers.

Note: the 2018 process is different than 2019. To generate large files for 2018:

Navigate to the 2018/python directory.
Adjust the 2018 File Large File Script Configuration to specify bank name, lei, tax id, row count, output filepath, and output filename.
Run python3 large_test_files_script.py to produce the large file.

Generating Edit Reports

Edit reports provide a summary of the syntax, validity, or quality edits passed or failed in a test submission file. The edit report contains the following fields.

edit name
status (pass/fail)
number of rows failed
ULIs/NULIs of rows that failed (as a list).

Edit reports can be generated for any synthetic submission file. Configuration options include (with defaulted values):

To generate edit reports for 2019 and 2020:

Navigate to the <year>/python directory.
Adjust the Edit Report Configuration to specify output.
- 2021
- 2020
- 2019
Run python3 generate_edit_report.py to produce the edit report in the directory according to the configuration file.

To generate edit reports for 2018:

Navigate to the 2018/python directory.
Adjust the 2018 Edit Report Configuration to specify output.
Run python3 generate_edit_report.py to produce the edit report in the directory according to the configuration file.

Data Generation Notes:

The default values for Bank0 are listed below.

Name: Bank0
LEI: B90YWS6AFX2LGWOXJ1LD
Tax ID: 01-0123456

The default values for Bank1 are listed below.

Name: Bank1
LEI: BANK1LEIFORTEST12345
Tax ID: 02-1234567

Other test bank LEIs:

BANK3LEIFORTEST12345
BANK4LEIFORTEST12345
999999LE3ZOZXUS7W648
28133080042813308004

Name		Name	Last commit message	Last commit date
Latest commit History 1,324 Commits
2018		2018
2019		2019
2020		2020
2021		2021
2022		2022
.gitignore		.gitignore
TERMS.md		TERMS.md
__init__.py		__init__.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Purpose and Scope:

Structure

Dependencies

Generating Clean Files

Generating Test Files

Generating Large Files

Generating Edit Reports

Data Generation Notes:

Open source licensing info

About

Releases

Packages

Languages

Kibrael/hmda-test-files

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Purpose and Scope:

Structure

Dependencies

Generating Clean Files

Generating Test Files

Generating Large Files

Generating Edit Reports

Data Generation Notes:

Open source licensing info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages