MedScan (Medical Scanner AI)

MedScan is a real-time system that ingests, stores, processes and indexes medical reports.

Data storage and processing take place in a distributed manner through the use of Kafka and Spark respectively.

In particular, information such as the status of examined anatomical structures, medical diagnosis and other information related to the patient's health is to be extracted from the documents.

The next step in this project would be to make the extraction customisable;

Ideally, any doctor specialising in any field of medicine could upload their reports and specify which parameters they would like the model to search for.

The power of ChatGPT-4 was used to extract the features.

📝 Requirements

Docker & Docker Compose
In order to to use ChatGPT-4 API, one possible solution is to create an Azure account and enable OpenAI services.
If you are a student, follow my guide to get an Azure account with $100 credit and no credit card required!

⚡ Quickstart

$ git clone https://github.com/WoWS17/MedScan.git

$ cd ./kafka/setup

$ wget https://dlcdn.apache.org/kafka/3.4.0/kafka_2.13-3.4.0.tgz

$ cd ../../

$ echo 'export AZURE_OPENAI_ENDPOINT=<Your OpenAI Endpoint>' >> ~/.bashrc

$ echo 'export AZURE_OPENAI_KEY=<Your OpenAI Secret Key>' >> ~/.bashrc

$ source ~/.bashrc

$ docker compose up --build

📊 Data flow

Data Source

The data analysed were provided by a doctor. They are diagnoses made following ultrasound examinations of the male inguinal-scrotal regions.

LogStash

What is it?

Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite "stash."

Kafka

What is it?

Apache Kafka is an open-source event streaming platform that enables the management and processing of real-time data streams. It is designed for scalability, reliability and speed, enabling the transmission of large volumes of data between applications and systems. Kafka operates on a publish-subscribe model, where producers send data into containers called topics and consumers subscribe to the topics to receive the data.

ElasticSearch

What is it?

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.

Spark

What is it?

Apache Spark is a lightning-fast unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009.

Kibana

What is it?

Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. Commonly known as the charting tool for the Elastic Stack (previously referred to as the ELK Stack after Elasticsearch, Logstash, and Kibana), Kibana also acts as the user interface for monitoring, managing, and securing an Elastic Stack cluster — as well as the centralized hub for built-in solutions developed on the Elastic Stack. Developed in 2013 from within the Elasticsearch community, Kibana has grown to become the window into the Elastic Stack itself, offering a portal for users and companies.

Useful links

Container	URL	Description
kafkaserver	http://localhost:8080	Open kafka UI to monitor kafka server
elasticsearch	http://localhost:9200/	ElasticSearch base URL
elasticsearch	http://localhost:9200/ner_idx/_search	ElasticSearch index content
kibana	http://localhost:5601	Kibana base URL

Authors

Giuseppe Coco

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
elasticsearch		elasticsearch
images		images
kafka		kafka
kibana		kibana
logstash		logstash
spark		spark
.gitattributes		.gitattributes
README.md		README.md
docker-compose.yml		docker-compose.yml
export.ndjson		export.ndjson
presentation.ipynb		presentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedScan (Medical Scanner AI)

📝 Requirements

⚡ Quickstart

📊 Data flow

Data Source

LogStash

What is it?

Kafka

What is it?

ElasticSearch

What is it?

Spark

What is it?

Kibana

What is it?

Useful links

Authors

About

Releases

Packages

Languages

giuseppe-coco/MedScan

Folders and files

Latest commit

History

Repository files navigation

MedScan (Medical Scanner AI)

📝 Requirements

⚡ Quickstart

📊 Data flow

Data Source

LogStash

What is it?

Kafka

What is it?

ElasticSearch

What is it?

Spark

What is it?

Kibana

What is it?

Useful links

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages