GitHub - jwu-sym/govtrade

Project description: The problem my project is aimed at solving is combating the distributed, unorganized information of government officials’ financial decisions. my project is for anyone who is interested in government officials’ financial disclosures and including their stocks and options transactions records. my project is unique because it identifies the ways in which political activities correlate with government officials’ financial transactions records. This enables transparency, eliminates the potential for insider trading, and identifies potential conflicts of interest. It allows retail investors to monitor and subsequently follow their trades before restrictions are made. Applicaiton Stack Architecture:

Overview

Architecture

Architecture Description

Data collectors use urllib to fetch two main public sites.
Data pre-processor converts downloaded csv data to structured raw data and insert/update corresponding tables in database.
Job scheduler has two jobs.

a. invoke periodical data collectors and preprocessor.

b. invoke batch data processor to transform structured raw data to application data structures and save update them to database.
Batch data processor does data transformation, data analysis and storing data for API server to use.
API server provides endpoints for application servera /dashboardb /search (by name/filing date/stock symbol)
Integration server: This component facilitates the integration and deployment of the application code from the source repository to the staging environment.
Frontend server uses template language to serve html pages from API server endpoints.
Web server reverse proxy for application server
Integration server does:

a. monitoring health of endpoints on api server, frontend, and web server.
b. continuous integration testing by code change and build

Initial architecture diagram(Week 1) is almost identical, only design changes are:

Relational(Postgresql) database was chosen instead of a document database. Initially I thought application needs store raw pdf files per record, which is large amount of data for relational database to handle. But in my development stage I could parse trades text in pdf files, that significantly reduced data amount to data store. Also search ability is more robust when using a relational database built in sql query.
Performance metrics services were added, using heroku managed ones.

Continuous Delivery

Application integrates with GitHub to make it easy to deploy to my app stack running on Heroku. When GitHub integration is configured for my app, Heroku can automatically build and release (if the build is successful). Continuous Delivery is implemented by using Heroku pipeline, it runs function & unit tests automatically for every subsequent code push to the GitHub. Along with any merges to master from dev branch that is used as staging. Staging will be promoted to production servers after tests. A few illustrative tests were written using standard pytest library and running continously upon each code change in GitHub.

Staging environment: This is a pre-production environment where the application is deployed and tested before being released to the production environment.

Monitoring and Performance Metrics

Monitoring service monitors the system's performance, health, and potential issues, providing visibility and alerting mechanisms. Heroku provides server performance metrics and alert services, it includes monotroing applicaiton Response time, Memory, Throughput. Alert service will send notifications upon system events in production, such as unresponsive endpoints, resource exhaustions, throughput over certain threshold limit.

Code Structure:

Web Application (app.py/service.py)

Standard Python Flask web app, that routes http requests and responds with templated data from database on web pages.
Apscheduler BackgroundScheduler is started when app.py starts, it runs fetcher.py 'main' method periodcally. The timestamp of data collection is displayed on the bottom of the site page.
service.py provides db records for the endpoint by retrieving them from database. It does certain data convertions on unstructured raw trades data.

Data Collection

fetcher.py uses Python requests lib to fetch two main public sites. respectively:
- https://disclosures-clerk.house.gov/public_disc/financial-pdfs/{year}FD.zip all congress members' trades disclosed in the year.
- https://disclosures-clerk.house.gov/public_disc/ptr-pdfs/2024/{docId} individual trade disclosure doc record.
- attaches individual parsed trade doc to record
- save records to postgresql database (hosted in Heroku)
- has function to collect multi years records.
processor.py helps fetcher.py convert raw data to structured records, extracts trades from trade pdf doc per record.
db.py inserts/update/remove gov trades records to a postgresql database hosted in heroku.
DB connection parameters are in .env file. Sample db records screenshot here.

Frontend

HTML page provides user to view/search/select goverment trading records. Application server is running a python flask stack. sort/search functionality uses sortable.js, bottom 'last run' displays data collection time.

Public url of my project: https://govtrade-a46bca12cc9b.herokuapp.com/

Run project code locally, under project root directory:

python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
export FLASK_APP=src/app.py
flask run --port 1234 --debug `

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
data		data
src		src
static		static
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
app.json		app.json
gitc		gitc
requirements.txt		requirements.txt
run		run
run-cli		run-cli
stop		stop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Architecture

Continuous Delivery

Monitoring and Performance Metrics

Web Application (app.py/service.py)

Data Collection

Frontend

Public url of my project: https://govtrade-a46bca12cc9b.herokuapp.com/

Run project code locally, under project root directory:

About

Releases

Packages

Languages

jwu-sym/govtrade

Folders and files

Latest commit

History

Repository files navigation

Overview

Architecture

Continuous Delivery

Monitoring and Performance Metrics

Web Application (app.py/service.py)

Data Collection

Frontend

Public url of my project: https://govtrade-a46bca12cc9b.herokuapp.com/

Run project code locally, under project root directory:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages