Squeak-SemanticText

ChatGPT, embedding search, and retrieval-augmented generation for Squeak/Smalltalk

Semantics (from Ancient Greek sēmantikós) refers to the significance or meaning of information. While the normal String and Text classes in Squeak take a syntactic view on text as a sequence of characters and formatting instructions, SemanticText focuses on the sense and understanding of text. With the advent of NLP (natural language processing) and LLMs (large language models), the availability of text interpretability in computing systems is expanding substantially. This package aims to make semantic context accessible in Squeak/Smalltalk by providing the following features:

OpenAI API client: Currently supports chat completions and embeddings. Includes tools for managing rate limits, tracking expenses, and estimating prices for queries.
SemanticConversation: Framework for conversational agents like ChatGPT.
ChatGPT: Conversational GUI for Squeak. Supports streaming responses, editing conversations, and defining system messages.
SemanticCorpus: Framework for semantic search, similarity search, and retrieval-augmented generation (RAG, aka "chat with your data") through the power of text embeddings. Implements a simple yet functional vector database.
Experimental tools such as an integration of semantic search and RAG into Squeak's Help Browser or Squeak's mailing list.

For more details, install the package and dive into the class comments, or read below.

ChatGPT

OpenAI API Expense Watcher

Editor Integration: Explain It / Summarize It

Help Browser Integration: Semantic Search and Retrieval Augmented Generation (RAG)

Squeak Inbox Talk Integration: Similar Conversation Search $Squeak Inbox Talk Integration: Similar Conversation Search. [squeak-dev] Some questions and comments regarding notation of floats and scaled decimals. Similar conversations (powered by OpenAI embeddings) / Experimental. May be biased or ineffective. / Numerics question: reading floating point constants / RE: Float equality? (was: [BUG] Float NaN's) / Rounding floats / Decimals as fractions / Bug in Floats? / Float differences / Float precision / ...$

Very simple and incomplete prototype yet. More might follow. Feedback and contributions welcome!

Installation

Get a current Squeak Trunk image (recommended) or a Squeak 6.0 image (only limited support) and do this in a workspace:

Metacello new
	baseline: 'SemanticText';
	repository: 'github://LinqLover/Squeak-SemanticText:main';
	get; "for updates"
	load.

As most functionality is currently based on the OpenAI API, you need to set up an API key here and paste it in the OpenAI API Key preference. If you register an account at OpenAI, you will receive a free budget of $5 for the first three months. This is enough for chatting more than 1 mio. words or embedding 50 mio. words (or 42 times the collected works of Shakespeare). However, if you want to make more or more frequent accesses to the API, you will need to provide a credit card. Nonetheless, tokens are really cheap - after playing with the API for a couple of weeks, I still have spent less than $10 in total.

Usage

Conversations and ChatGPT

GUI

From the world main docking bar, go to Apps > ChatGPT.

API

Basic usage is like this:

SemanticConversation new
	addSystemMessage: 'You make a bad pun about everything the user writes to you.';
	addUserMessage: 'Yesterday I met a black cat!';
	getAssistantReply. --> 'Oh no, did you have to cross its "purr-th?"'

You can also improve the prompt by inserting additional pairs of user/assistant messages prior to the interaction. Keep in mind that this reduces the remaining set of tokens for the conversation and increases the expenses (reply time and money) of the query:

SemanticConversation new
	addSystemMessage: 'You answer every question with the opposite of the truth.';
	addUserMessage: 'What is the biggest animal on earth?';
	addAssistantMessage: 'The biggest animal on earth is plankton.';
	addUserMessage: 'What is the smallest country on earth?';
	getAssistantReply. --> 'The largest country on earth is Vatican City.'

Semantic and similary search

GUI - Experimental Help Browser integration

Open a Help Browser from the world main docking bar and type in your query into search field. Note that at the moment, synonymous search terms work better than questions (e.g., prefer "internet connection" over "how can I access the internet?").

API

Everything starts at the class SemanticCorpus. For example, this is how you could set up a semantic search corpus for Squeak's Help System yourself:

"Set up and populate semantic corpus"
helpTopics := CustomHelp asHelpTopic semanticDeepSubtopicsSkip: [:topic |
	topic title = 'All message categories']. "not relevant"
corpus := SemanticPluggableCorpus titleBlock: #title contentBlock: #contents.
corpus addFragmentDocumentsFromAll: helpTopics.
corpus estimatePriceToInitializeEmbeddings. --> approx ¢1.66
corpus updateEmbeddings.

"Similarity search"
originTopic := helpTopics detect: [:ea | ea key = #firstContribution].
results := corpus findObjects: 10 similarToObject: originTopic.

"Semantic search"
results := corpus findObjects: 10 similarToQuery: 'internet connection'.
"Optionally, display results in a HelpBrowser"
resultsTopic := HelpTopic named: 'Search results'.
results do: [:ea | resultsTopic addSubtopic: ea].
resultsTopic browse.

"RAG"
(corpus newConversationForQuery: 'internet connection') open.

Editor Integration

Yellow-click on any text editor (optionally select a portion of text before that), click more..., and select one of explain it, summarize it, and ask question about it.... Or shortly via keyboard: Esc, 🔼, Enter, q. 🤓

Squeak Inbox Talk Integration

Get Squeak Inbox Talk (world main docking bar > Tools > Squeak Inbox Talk), update it to the latest version through the Settings menu, and turn on the option Semantic search in Squeak Inbox Talk in the preferences browser.

OpenAI API Expense Watcher

Do this:

OpenAIAccount openExpenseWatcher

Users of SemanticText

At the moment, the following projects are known to make use of SemanticText:

Acknowledgments

Thanks to Vincent Eichhorn (@vincenteichhorn) for giving me an overview of indexing techniques for Vector DBs (will implement one soon!). Thanks to Toni Mattis (@amintos) for tips regarding embedding search (in particular for 541ae49). Thanks to r/MachineLearning folks for suggesting alternative embedding models (your suggestions may be implemented one day).

Happy Squeaking!

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
assets		assets
packages		packages
.project		.project
.squot		.squot
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Squeak-SemanticText

Installation

Usage

Conversations and ChatGPT

GUI

API

Semantic and similary search

GUI - Experimental Help Browser integration

API

Editor Integration

Squeak Inbox Talk Integration

OpenAI API Expense Watcher

Users of SemanticText

Acknowledgments

About

Releases

Packages

Languages

hpi-swa-lab/Squeak-SemanticText

Folders and files

Latest commit

History

Repository files navigation

Squeak-SemanticText

Installation

Usage

Conversations and ChatGPT

GUI

API

Semantic and similary search

GUI - Experimental Help Browser integration

API

Editor Integration

Squeak Inbox Talk Integration

OpenAI API Expense Watcher

Users of SemanticText

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages