The Home Mortgage Disclosure Act (HMDA) Platform is a Regulatory technology application for financial institutions to submit mortgage information as described in the Filing Instruction Guide (FIG). The HMDA-Platform parses data as submitted by mortgage leading institutions and validates the information for edits (Syntactical, Validity, Quality, and Macro as-per the instructions in the FIG) before submitting the data. The HMDA-Platform supports quarterly and yearly filing periods. For detailed information on Home Mortgage Disclosure Act (HMDA), checkout the About HMDA page on the CFPB website.
Please watch this short video to view how HMDA Platform transforms the data upload, validation, and submission process.
Project | Repo Link | Description |
---|---|---|
Frontend | https://github.com/cfpb/hmda-frontend | ReactJS Front-end repository powering the HMDA Platform |
HMDA-Help | https://github.com/cfpb/hmda-help | ReactJS Front-end repository powering HMDA Help - used to resolve and troubleshoot issues in filing |
LARFT | https://github.com/cfpb/hmda-platform-larft | Repo for the Public Facing LAR formatting tool |
HMDA Test Files | https://github.com/cfpb/hmda-test-files | Repo for automatically generating various different test files for HMDA Data |
HMDA Census | https://github.com/cfpb/hmda-census | ETL for geographic and Census data used by the HMDA Platform |
HMDA Data Science | https://github.com/cfpb/HMDA_Data_Science_Kit | Repo for HMDA Data science work as well as Spark codebase for Public Facing A&D Reports |
- TS and LAR File Specs
- End-to-End filing GIF
- Technical Overview
- HMDA Platform Technical Architecture
- HMDA Data Browser Technical Architecture
- Running with sbt
- One-line Cloud Deployment to Dev/Prod
- Docker Hub
- One-line Local Development Environment (No Auth)
- Automated Testing
- Postman Collection
- API Documentation
- Sprint Cadence
- Code Formatting
- Development Process
- Contributing
- Issues
- Open source licensing info
- Credits and references
The data is submitted in a flat pipe (|
) delimited TXT file. The text file is split into two parts: Transmission (TS) File -- first line in the file and Loan Application Register (LAR) -- all remaining lines of the file. Below are the links to the file specifications for data collected in years 2018 - current.
The hmda-frontend uses Cypress to test the end-to-end filing process from the end user perspective. The GIF below shows the automated filing process via Cypree - no human intervention.
This repository contains the code for the entirety of the public facing HMDA Platform backend. This platform has been designed to accommodate the needs of the HMDA filing process by financial institutions, as well as the data management, publication, aggregation, reporting, analyzing, visualizing, and downloading the HMDA data set.
The HMDA Platform follows a loosely coupled event driven micro-services architecture with API-first (API Documentation) design principles. The entire platform is built on open source frameworks and remains cloud vendor agnostic.
The code base contained in this repository includes the following microservices that work together in support of the HMDA Platform.
-
HMDA Platform: The entire backend API for public facing filing platform. Used for processing the uploaded TXT files and validating them in a non-blocking, I/O streaming way. The APIs are built to be able to process various file sizes, from small (few lines) to large (1.5M+ lines), text files simultaneously without impeding the scalability or availability of the platform. The platform contains code for customizable data edits, a Domain Specific Language (DSL) for coding the data edits, and submitting events to Kafka topics.
-
Check Digit: The entire backend API for public facing check digit tool. The Check Digit tool is used to (1) Generate a two character check-digit based on an Legal Entity Identifier (LEI) and (2) Validate that a check-digit is calculated correctly for any complete Universal Loan Identifier (ULI). This APIs are built to process multiple row CSV files as well as one time processing.
-
Institutions API: Read only API for fetching details about an LEI. This microservice also listens to events put on the
institutions-api
Kafka topic for Creating, updating, and deleting institution data from PostgreSQL. -
Data Publisher: This microservice runs on a scheduled basis to make internal / external data available for research purposes via object stores such as S3. The schedule for the job is configurable via K8s config map
-
Ratespread: Public facing API for the ratespread calculator. This calculator provides rate spreads for HMDA reportable loans with a final action date on or after January 1st, 2018. This API supports streaming CSV uploads as well as one-time calculations.
-
Modified LAR: Event driven service of modified-lar reports. Each time a filer successfully submits the data, the modified-lar micro-service generates a modified-lar report and puts it in the public object store (e.g. S3). Any re-submissions automatically re-generate new modified-lar reports.
-
IRS Publisher: Event driven service of irs-disclosure-reports. Each time a filer successfully submits the data, the irs-publisher microservice generates the IRS report.
-
HMDA Reporting: Real-time, public facing API for getting information (LEI number, institution name, and year) on LEIs who have successfully submitted their data.
-
HMDA Analytics: Event driven service to insert, delete, update information in PostgreSQL each time there is a successful submission. The data inserted maps with the Census data to provide information for MSAMds. It also adds race, sex, and ethnicity categorization to the data.
-
HMDA Dashboard: Authenticated APIs to view realtime analytics for the filings happening on the platform. The dashboard includes summarized statistics, data trends, and supports data visualizations via frontend.
-
Rate imit: Rate limiter service working in-sync with ambassador to limit the number of times in a given time period that the API can be called. If the rate limit is reached, a 503 error code is sent.
-
HMDA Data Browser: Public facing API for HMDA Data Browser. This API makes the entire dataset available for summarized statistics, deep analysis, as well as geographic map layout.
-
Email Service: Event driven service to send an automated email to the filer on each successful submission.
The image below shows the cloud vendor agnostic technical architecture for the HMDA Platform.
Please view the README for HMDA Data Browser
Before running the HMDA Platform, make sure to have the following installed:
- Homebrew - https://brew.sh/
- Docker -
bash brew install docker
- Docker Desktop - https://docs.docker.com/desktop/install/mac-install/
- Java (version 13.0.2) for MacOS
- Scala (version 2.12 for compatibility issues) -
bash brew install [email protected]
- sdk - https://sdkman.io/install Next, use sdk to install sbt instead of brew (it won't work with brew) (Note: before install, check what version is currently being used in project/build.properties and install that version or higher):
sdk install sbt
Clone the repo and go into the repo directory:
git clone https://github.com/cfpb/hmda-platform.git
cd hmda-platform
The HMDA Platform can run locally using sbt
with an embedded Cassandra and embedded Kafka. To get started:
cd hmda-platform
export CASSANDRA_CLUSTER_HOSTS=localhost
export APP_PORT=2551
sbt
[...]
sbt:hmda-root> project hmda-platform
sbt:hmda-platform> reStart
hmda-admin-api
hmda-filing-api
hmda-public-api
Docker Image is build via Docker plugin utilizing sbt-native-packager
sbt -batch clean hmda-platform/docker:publishLocal
The image can be built without running tests using:
sbt "project hmda-platform" dockerPublishLocalSkipTests
The platform and all of the related microservices explained above are deployed on Kubernetes using Helm. Each deployment is a single Helm command. Below is an example for the deployment of the email-service:
helm upgrade --install --force \
--namespace=default \
--values=kubernetes/hmda-platform/values.yaml \
--set image.repository=hmda/hmda-platform \
--set image.tag=<tag name> \
--set image.pullPolicy=Always \
hmda-platform \
kubernetes/hmda-platform
All of the containers built by the HMDA Platform are released publicly via Docker Hub: https://hub.docker.com/u/hmda
The platform and it's dependency services, Kafka, Cassandra and PostgreSQL, can run locally using Docker Compose.
# Bring up hmda-platform, hmda-analytics, institutions-api
docker-compose up
The entire filing plaform can be spun up using a one line command. Using this locally running instance of Platform One, no authentication is needed.
# Bring up the hmda-platform
docker-compose up hmda-platform
Additionally, there are several environment varialbes that can be configured/changed. The platform uses sensible defaults for each one. However, if required they can be overridden:
CASSANDRA_CLUSTER_HOSTS
CASSANDRA_CLUSTER_DC
CASSANDRA_CLUSTER_USERNAME
CASSANDRA_CLUSTER_PASSWORD
CASSANDRA_JOURNAL_KEYSPACE
CASSANDRA_SNAPSHOT_KEYSPACE
KAFKA_CLUSTER_HOSTS
APP_PORT
HMDA_HTTP_PORT
HMDA_HTTP_ADMIN_PORT
HMDA_HTTP_PUBLIC_PORT
MANAGEMENT_PORT
HMDA_CASSANDRA_LOCAL_PORT
HMDA_LOCAL_KAFKA_PORT
HMDA_LOCAL_ZK_PORT
WS_PORT
The HMDA Platform takes a rigorous automated testing approach. In addtion to Travis and CodeCov, we've prepared a suite of Newman test scripts that perform end-to-end testing of the APIs on a recurring basis. The testing process for Newman is containerized and runs as a Kubernetes CronJob to act as a monitoring and alerting system. The platform and microservices are also testing for load by using Locust.
In addition to using Newman for our internal testing, we've created a HMDA Postman collection that makes it easier for users to perform a end-to-end filing of HMDA Data, including upload, parsing data, flagging edits, resolving edits, and submitting data when S/V edits are resolved.
The HMDA Platform Public API Documentation is hosted in the HMDA Platform API Docs repo and deployed to GitHub Pages using the gh-pages
branch.
Our team works in two week sprints. The sprints are managed as Project Boards. The backlog grooming happens every two weeks as part of Sprint Planning and Sprint Retrospectives.
Our team uses Scalafmt to format our codebase.
Below are the steps the development team follows to fix issues, develop new features, etc.
- Create a fork of this repository
- Work in a branch of the fork
- Create a PR to merge into master
- The PR is automatically built, tested, and linted using: Travis, Snyk, and CodeCov
- Manual review is performed in addition to ensuring the above automatic scans are positive
- The PR is deployed to development servers to be checked using Newman
- The PR is merged only by a separate member in the dev team
CFPB
is developing the HMDA Platform
in the open to maximize transparency and encourage third party contributions. If you want to contribute, please read and abide by the terms of the License for this project. Pull Requests are always welcome.
We use GitHub issues in this repository to track features, bugs, and enhancements to the software.
Related projects
- https://github.com/cfpb/hmda-combined-documentation - ReactJS Front-end with intergration DocSearch program repository powering the HMDA Platform
- https://github.com/cfpb/hmda-platform-larft - Repo for the Public Facing LAR formatting tool
- https://github.com/cfpb/hmda-test-files - Repo for automatically generating various different test files for HMDA Data
- https://github.com/cfpb/hmda-census - ETL for geographic and Census data used by the HMDA Platform
- https://github.com/cfpb/HMDA_Data_Science_Kit - Repo for HMDA Data science work as well as Spark codebase for Public Facing A&D Reports