Text for Semitic languages, such as Arabic and Hebrew, is written using right-to-left (RTL) character sets and is not necessarily legible to scholars or religious people who are often familiar with only a Western character set. As such, there is a need for a search tool that allows input in a Latin-based character set on a normal QWERTY keyboard and doing the search against a corpus in a Semitic character set.
This repository contains the code to CorpusSearch, which, at its core is a web application to allow phonetic searching of the sacred texts of Semitic languages as well as a tool that permits socially reading sacred Semitic texts (such as the Hebrew Bible or the Arabic Qur'an).
I made this to get my hand at several technologies:
Technology | Purpose |
---|---|
Python | Prototyping / validating search algorithm |
ASP.net | Backend framework |
C# | Strong typing of ASP.NET backend |
gRPC | Communication between search service / web server |
GraphQL | Communication between web server and front end |
Angular 2 | Frontend Framework |
Typescript | Strong typing for Angula frontend |
Redis | Caching of search results |
RabbitMQ | For messaging related to social reading |
There are several parts to this repository:
- A quick search prototype, written in Python
- A search microservice, written in C#, which communicates using gRPC
- An ASP.net microservice (written in C#), handling
The search prototype was written for two reasons:
- Allow downloading of a Semitic corpus (in this case the Arabic Qur'an), and
- Validate the methodology for the search algorithm
You can re-generate the corpus from scratch by downloading it from the web (though it is provided as part of the source code in this repository, so is not strictly required). Once the corpus is generated, you can run a phonetic search on the text using.
As it is meant to be a quick prototype, the code is written in Python.
To fetch the corpus data (this step is optional, as the data is shipped with the repository):
python3 fetch_data.py
Next, to run the search, first build the Docker container, and then run it:
cd Prototype
docker build -t arabic-search .
# Runs the search term in `search_poc.py`
docker run arabic-search
The C# and ASP.net-based solution has several projects within:
SearchMicroservice
- a gRPC based micro-service for performing phonetic searches on corpus textsSearchMicroserviceTests
- tests for theSearchMicroservice
SearchWebServer
- a ASP.net / Angular / GraphQL based web app project that provides a UI for performing the search
To run the project, open the solution in Visual Studio 2019 (the solution file is called /Search.sln
). To run the web application, only running the microservice and web server is required. Follow these steps to get the project running:
- Restore NuGet packages (Project > Restore NuGet Packages)
- Build the projects (Project > Build All)
- Select the proper run configuration (or define a run configuration that has both projects selected -
SearchMicroservice
andSearchWebserver
). Start the project without debugging.
To access the search GraphQL API, access the graphical interface at http://localhost:59836/graphql/. Perform a phonetic search as follows:
query {
phoneticSearch(term:"innama") {
__typename
chapter
verse
score
text
}
}