Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
ZiyiXia committed Jan 9, 2025
1 parent 4efa19d commit 757db32
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 1 deletion.
1 change: 0 additions & 1 deletion docs/source/API/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ API
===

.. toctree::
:hidden:
:maxdepth: 1

abc
Expand Down
38 changes: 38 additions & 0 deletions docs/source/Introduction/IR.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Information Retrieval
=====================

What is Information Retrieval?
------------------------------

Simply put, Information Retrieval (IR) is the science of searching and retrieving information from a large collection of data based on a user's query.
The goal of an IR system is not just to return a list of documents but to ensure that the most relevant ones appear at the top of the results.

A very straightforward example of IR is library catalog. One wants to find the book that best matches the query, but there are thousands or millions of books on the shelf.
The library's catalog system helps you find the best matches based on your search terms.
In modern digital world, search engines and databases work in a similar way, using sophisticated algorithms and models to retrieve, rank and return the most relevant results.
And the resource categories are expanding from text to more modalities such as images, videos, 3D objects, music, etc.

IR and Embedding Model
----------------------

Traditional IR methods, like TF-IDF and BM25, rely on statistical and heuristic techniques to rank documents based on term frequency and document relevance.
These methods are efficient and effective for keyword-based search but often struggle with understanding the deeper context or semantics of the text.

.. seealso::

Take a very simple example with two sentences:

.. code:: python
sentence_1 = "watch a play"
sentence_2 = "play with a watch"
Sentence 1 means going for a show/performance, which has watch as a verb and play as a noun.

However sentence 2 means someone is interacting with a timepiece on wrist, which has play as a verb and watch as a noun.

These two sentences could be regard as very similar to each other when using the traditional IR methods though they actually have totally different semantic meaning.
Then how could we solve this? The best answer up until now is embedding models.

Embedding models have revolutionized IR by representing text as dense vectors in a high-dimensional space, capturing the semantic meaning of words, sentences, or even entire documents.
This allows for more sophisticated search capabilities, such as semantic search, where results are ranked based on meaning rather than simple keyword matching.
1 change: 1 addition & 0 deletions docs/source/Introduction/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,6 @@ Quickly get started with:
:maxdepth: 1
:caption: Concept

IR
model
retrieval_demo

0 comments on commit 757db32

Please sign in to comment.