This document lists and describes various GitHub projects and pages which are relevant to my work. This is a living document and will be updated regularly to highlight new ideas and directions.
The document is divided into five main sections: Pre-Release (Private) Projects, Research Ideas (PRivate), Public Projects, Projects from DeepLearning.AI, etc, and Projects Inspired by Others.
These projects are completed and publicly available for use or contribution.
- Project Delta: Placeholder for a publically available project.
These are various areas of interest that I am researching.
- Project Delta: Placeholder for a publically available project.
These are projects currently under development or in a pre-release stage. They are not yet publicly available but are significant to the overall development roadmap.
-
FigLang 2024 Euphemisms: This project is the work associated with FigLan 2024 Sharted Task on Euphemisms
-
Project Beta: This project related to ...
- Gen-AI-for-everyone: Generative AI for Everyone
- Build-Eval-AdvRAG
- xxx: Prompt Engineering / Fine Tuning LLMs.
- Align-LLM-DPO: Align LLMs with Direct Preference Optimization (DPO).
- Knowledge-Graphs-for-RAG: Knowledge Graphs for RAG.
- Micosoft AutoGen: An open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks.
- Google Distillation with Rationale: Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes
This section acknowledges projects and ideas that have inspired me. These may be in various states (working?) and are more here for reference
- flairNLP - FlairNLP for named entity recognition (NER)
- ollama-voice-mac - offline voice assistant using Mistral 7b via Ollama and Whisper speech recognition models
- Automatic Prompt Engineer - This repo contains code for "Large Language Models Are Human-Level Prompt Engineers"
- Stanford-DSPy - DSPy: Programming with Foundation Models
- Choma-core - Chroma - the open-source embedding database
- 584-final: Sentence Embeddings using Supervised Contrastive Learning. Danqi Liao.
- ACLPUB: The official tool for creating proceedings for conferences of the Association for Computational Linguistics (ACL).
- annotated-transformer: http://nlp.seas.harvard.edu/2018/04/03/attention.html
- BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.
- BERT_basic: BERT repository to demonstrate basic functionality
- Contrastive-Tension: State of the art Semantic Sentence Embeddings
- COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
- diseaseBERT: Code and dataset of EMNLP 2020 paper "Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition"
- FastChat: The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"
- Fine-Tuning-BERT: Example BERT fine-tuned to perform spam classification
- huggingface_hub: All the open source things related to the Hugging Face Hub.
- introduction_to_ml_with_python: Notebooks and code for the book "Introduction to Machine Learning with Python"
- ISHate: This repository contains the dataset and implementation details of the paper "An In-depth Analysis of Implicit and Subtle Hate Speech Messages" accepted at EACL 2023.
- KPA_2021_shared_task: Shared task hosted by IBM in the ArgMining workshop in EMNLP
- langchain: ⚡ Building applications with LLMs through composability ⚡
- LeafNATS: Learning Framework for Neural Abstractive Text Summarization
- llama: Inference code for LLaMA models
- medium_articles: Scripts/Notebooks used for articles published regarding Time series and asset allocation as reference for a data science class
- NATS: Neural Abstractive Text Summarization with Sequence-to-Sequence Models
- nlp-with-transformers: Jupyter notebooks for the Natural Language Processing with Transformers book
- PythonClass: Looks to be stale
- Reddit-Data-Mining: How to extract and analyse different parts of reddit threads and comments
- redditDataExtractor: The reddit Data Extractor is a cross-platform GUI tool for downloading almost any content posted to reddit. Downloads from specific users, specific subreddits, users by subreddit, and with filters on the content is supported. Some intelligence is built in to attempt to avoid downloading duplicate external content.
- rogue-dimensions: replication code for EMNLP 2021 paper
- sent-summary: Looks to be stale
- sentence-transformers: Multilingual Sentence & Image Embeddings with BERT
- SimCSE: EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings
- stl-scraper: Scrape short-term listings providers (Airbnb)
- tensor2tensor: Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
- text-summarization-tensorflow: Tensorflow seq2seq Implementation of Text Summarization.
v1.0