BirdiDQ - About 🔍

BirdiDQ is an intuitive and user-friendly data quality application that allows you to run data quality checks on top of python great expectation open source library using natural language queries. Type in your requests, and BirdiDQ will generate the appropriate GE method, run the quality control and return the results along with data docs you need. Demo Video

Note

BirdiDQ is under development and is an open source project. Contributions are welcomed!

Features

🔍 Data Exploration Made Easy: Quickly and interactively explore your data using a range of features like filters, comparisons, and more. Uncover hidden insights and make informed decisions with confidence.
🎯 Natural Language Processing:: Speak BirdiDQ's language! No technical expertise required. Simply type in your queries, and BirdiDQ intelligently converts them into powerful Great Expectations methods (using a fine-tuned Large Language Model), saving you time and effort..
⚡ Instant Results: Run comprehensive data quality checks on your selected data sources, and get instant feedback on data inconsistencies. BirdiDQ ensures that your data is reliable and trustworthy.
📧Automate Email Alert: Reach out to the Data Owner directly through the app, sending them an email with the detailed data quality report generated by Great Expectations.
GEN AI models: Uses finetuned LLM on customed expectations data.

Tech Stack

This app is an LLM-powered app built using:

Streamlit
Great Expectations
Finetuned LLMs:
- Falcon-7B parameters causal decoder-only model: The model is finetuned on custom data with Qlora approach.
- OpenAI GPT-3: Also finetuned on the same data

Instalation instructions

To run BirdiDQ, you need to perform the following steps:

Clone the repository locally:

 git clone https://github.com/BirdiD/BirdiDQ.git

(Recommended) Create a virtual environment and activate it:
```
 python3 -m venv bir_env
 source bir_env/bin/activate
```
Install the required dependencies:
```
 pip install -r requirements.txt
```

Run the app:

 streamlit run great_expectations/app.py

Note: BirdiDQ can use OpenAI's ChatGPT or Falcon LLM to convert the natural language descriptions to expectations. If you plan to use Falcon, consider using Pytorch with GPU support for better performance. To install Pytorch with CUDA support follow the instructions avaiable at for your Operating System at Pytorch.

System requirements for local Falcon LLM usage

Falcon 7b is an open source large language model (LLM) that can be used with BirdiDQ to convert natural language descriptions to Great Expectations expectations. To use the current fine-tuned Falcon 7b, you need to have a system with the following minimum requirements:

If you don't have a GPU, you need at least 16GB of RAM to load the model into the memory. Inferencing will be really slow.
You need a GPU with at least 16GB of VRAM to load the model into the memory. Inferencing will be faster.

Example Queries

Here are some example queries you can try with BirdiDQ:

Ensure that at least 80% of the values in the country column are not null.
Check that none of the values in the address column match the pattern for an address starting with a digit.

BirdiDQ Integration Stack

BirdiDQ integrates, connects, and works with a range of tools and services.

Filesystem
- Support Local Filesystem with Pandas
- Support Local Filesystem with Spark
Database
- Support PostgreSQL
- Support BigQuery
- Support Snowflake
- Support Amazon Athena
- Support AWS Redshift
Cloud
- Connect to data on Amazon S3 using Pandas
- Connect to data on Azure Blob storage using Pandas
- Connect to data on GCS using Pandas

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
great_expectations		great_expectations
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BirdiDQ - About 🔍

Note

Features

Tech Stack

Instalation instructions

System requirements for local Falcon LLM usage

Example Queries

BirdiDQ Integration Stack

About

Releases

Packages

Contributors 2

Languages

License

BirdiD/BirdiDQ

Folders and files

Latest commit

History

Repository files navigation

BirdiDQ - About 🔍

Note

Features

Tech Stack

Instalation instructions

System requirements for local Falcon LLM usage

Example Queries

BirdiDQ Integration Stack

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages