Unofficial Clarifai AI search plugin for docusaurus. Adds AI search to your docs.
flowchart LR
index-files.js --"read .md files one by one"--> docusaurus[(docusaurus files)]
index-files.js --"add .md files with metadata\nPOST /inputs"--> clarifai["<a href="https://clarifai.com">🌀 Clarifai</a>"]
index-files.js --"upload base64 images \n with page metadata"--> clarifai
index-files.js --"post external image urls \n with page metadata"--> clarifai
docusaurus-ui -."include".-> SearchBar.js --"find pages matching query \nPOST /search 'bees'"--> proxy-search.js --"find similar inputs with page metadata\nPOST inputs/searches"--> clarifai
- Indexing of text pages. This scans your
.md
docs and publishes files to Clarifai, which then uses embedding model (running as part of Clarifai workflow in the cloud) to index your markdown files and generates embeddings - Indexing of images. This uploads base64 images to Clarifai, so that you can search by image content as well.
- Indexing of external images. Similar to local file references, this simply posts external image urls to be indexed for search.
- Search your documentation (.md files) using specified models. Generated embeddings are matched with a vector similarity search using Clarifai public search API.
-
Compared to traditional text search, this allows to find documents that are relevant by meaning. You can pick which embedding model should be used (multilingual, domain specific etc) and the granularity level (document, paragraph etc).
-
You control indexing time and speed of your documents (compared to crawling done by Algolia for example).
-
You can customize search process yourself as search service runs on your side
This was created as a hackathon project within 24h, so code is not ideal, feel free to improve. Whishlist for next steps:
- maintain local cache of indexed files
- remove index on .md or image file deletion
- update index if doc was moved
- add LRU cache to not hit Clarifai API on repeated search
- keyboard shortcuts (up/down/enter to navigate search results)
- more media type support
- add PDFs to have visual search built-in
- audio (mp3) file matching and search
- have CI example to have reindexing working automatically
- have fallback to local text search (for example if search failed or is too slow)
- have RAG, so that search could answer questions, ex. "given these markdown repo, write code that adds new audio file.."
- needs different UI, large modal for more text, chat history etc
- batching. Currently we post each file separately, but we could batch them to speed up and not hit rate limiter so fast
- convert to typescript
- tests
After having docusaurus installed, copy SearchBar.js
to your repo src/theme/SearchBar.js
to render results.
This component should make requests to your locally running search service proxy-search.js
(replace localhost:5000
with real production url you will host)
(Optional) To position input in the sidebar, edit docusaurus config and set under themeConfig.navbar.items:
{
"type": "search",
"position": "right"
}
- Checkout this repo somewhere close to your docs.
- Register in Clarifai and create an app with text workflow.
- Create
.env
file which will reference Clarifai credentials that bothindex-files.js
andproxy-search.js
will use. You can get token from app settings that is for examplehttps://clarifai.com/my-user-123/my-docs/settings
it would look like:
CLARIFAI_PAT=dda3555e476742c8a894857a2c9b5170
CLARIFAI_APP_ID=my-docs
CLARIFAI_USER_ID=my-user-123
Run indexing:
node index-files.js ../my-docusaurus/docs/
This should post all of .md files to clarifai app. From this UI you can delete indexed inputs as well. Theoretically you can run indexing in CI 🤔
- Start your search service
node proxy-search.js
. This simply proxies search requests to Clarifai API, so that your tokens are safe.
Happy searching!