Skip to content
View BoilerToad's full-sized avatar

Block or report BoilerToad

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BoilerToad/README.md

Project Overview

This document lists and describes various GitHub projects and pages which are relevant to my work. This is a living document and will be updated regularly to highlight new ideas and directions.

The document is divided into five main sections: Pre-Release (Private) Projects, Research Ideas (PRivate), Public Projects, Projects from DeepLearning.AI, etc, and Projects Inspired by Others.

Public Projects

These projects are completed and publicly available for use or contribution.

  1. Project Delta: Placeholder for a publically available project.

Research Ideas (Private)

These are various areas of interest that I am researching.

  1. Project Delta: Placeholder for a publically available project.

Pre-Release (Private) Projects

These are projects currently under development or in a pre-release stage. They are not yet publicly available but are significant to the overall development roadmap.

  1. FigLang 2024 Euphemisms: This project is the work associated with FigLan 2024 Sharted Task on Euphemisms

  2. Project Beta: This project related to ...

Projects Based on DeepLearningAI, OpenAI, Coursera ..

  1. Gen-AI-for-everyone: Generative AI for Everyone
  2. Build-Eval-AdvRAG
  3. xxx: Prompt Engineering / Fine Tuning LLMs.
  4. Align-LLM-DPO: Align LLMs with Direct Preference Optimization (DPO).
  5. Knowledge-Graphs-for-RAG: Knowledge Graphs for RAG.
  6. Micosoft AutoGen: An open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks.
  7. Google Distillation with Rationale: Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes

Projects Inspired by Others

This section acknowledges projects and ideas that have inspired me. These may be in various states (working?) and are more here for reference

  1. flairNLP - FlairNLP for named entity recognition (NER)
  2. ollama-voice-mac - offline voice assistant using Mistral 7b via Ollama and Whisper speech recognition models
  3. Automatic Prompt Engineer - This repo contains code for "Large Language Models Are Human-Level Prompt Engineers"
  4. Stanford-DSPy - DSPy: Programming with Foundation Models
  5. Choma-core - Chroma - the open-source embedding database
  6. 584-final: Sentence Embeddings using Supervised Contrastive Learning. Danqi Liao.
  7. ACLPUB: The official tool for creating proceedings for conferences of the Association for Computational Linguistics (ACL).
  8. annotated-transformer: http://nlp.seas.harvard.edu/2018/04/03/attention.html
  9. BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics.
  10. BERT_basic: BERT repository to demonstrate basic functionality
  11. Contrastive-Tension: State of the art Semantic Sentence Embeddings
  12. COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
  13. diseaseBERT: Code and dataset of EMNLP 2020 paper "Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition"
  14. FastChat: The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"
  15. Fine-Tuning-BERT: Example BERT fine-tuned to perform spam classification
  16. huggingface_hub: All the open source things related to the Hugging Face Hub.
  17. introduction_to_ml_with_python: Notebooks and code for the book "Introduction to Machine Learning with Python"
  18. ISHate: This repository contains the dataset and implementation details of the paper "An In-depth Analysis of Implicit and Subtle Hate Speech Messages" accepted at EACL 2023.
  19. KPA_2021_shared_task: Shared task hosted by IBM in the ArgMining workshop in EMNLP
  20. langchain: ⚡ Building applications with LLMs through composability ⚡
  21. LeafNATS: Learning Framework for Neural Abstractive Text Summarization
  22. llama: Inference code for LLaMA models
  23. medium_articles: Scripts/Notebooks used for articles published regarding Time series and asset allocation as reference for a data science class
  24. NATS: Neural Abstractive Text Summarization with Sequence-to-Sequence Models
  25. nlp-with-transformers: Jupyter notebooks for the Natural Language Processing with Transformers book
  26. PythonClass: Looks to be stale
  27. Reddit-Data-Mining: How to extract and analyse different parts of reddit threads and comments
  28. redditDataExtractor: The reddit Data Extractor is a cross-platform GUI tool for downloading almost any content posted to reddit. Downloads from specific users, specific subreddits, users by subreddit, and with filters on the content is supported. Some intelligence is built in to attempt to avoid downloading duplicate external content.
  29. rogue-dimensions: replication code for EMNLP 2021 paper
  30. sent-summary: Looks to be stale
  31. sentence-transformers: Multilingual Sentence & Image Embeddings with BERT
  32. SimCSE: EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings
  33. stl-scraper: Scrape short-term listings providers (Airbnb)
  34. tensor2tensor: Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
  35. text-summarization-tensorflow: Tensorflow seq2seq Implementation of Text Summarization.

v1.0

Popular repositories Loading

  1. PythonClass PythonClass Public

  2. COVID-19 COVID-19 Public

    Forked from CSSEGISandData/COVID-19

    Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE

    Jupyter Notebook

  3. text-summarization-tensorflow text-summarization-tensorflow Public

    Forked from dongjun-Lee/text-summarization-tensorflow

    Tensorflow seq2seq Implementation of Text Summarization.

    Python

  4. sent-summary sent-summary Public

    Forked from harvardnlp/sent-summary

  5. tensor2tensor tensor2tensor Public

    Forked from tensorflow/tensor2tensor

    Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

    Python

  6. NATS NATS Public

    Forked from tshi04/NATS

    Neural Abstractive Text Summarization with Sequence-to-Sequence Models

    Python