This is a course project for the postgraduate level course of Computer Vision and Cognitive System taught at DIEF, UniMoRe.
- Project Presentation: https://docs.google.com/presentation/d/1oa5Y8bHkKgFodULnyd5vInzbxH9hIJZVeqW5iYaaxxA/edit?usp=sharing
- Project Report: https://drive.google.com/file/d/1YyxH2Q-KSBpQzCXXfHmBY9ajPjs3Un1R/view
- Model Weights: https://drive.google.com/drive/folders/1QY-fAs8u-BdupzJeAMYeGz4cKXeUWnnR?usp=drive_link
- For the Object Detection task, we use the SKU110K dataset.
- For the Product Classification and Embeddings for the Product Retrieval task, we use the GroceryStoreDataset.
sbatch frcnn.slurm
For training the DenseNet 121 model for Product Classification and Embeddings for the Product Retrieval:
sbatch clf.slurm
- For the implementation of the complete pipeline:
- Classical Scene Image Preprocessing (Histogram Equalization)
- Inference of both models: Faster RCNN and DenseNet 121 (commented out)
- Shelf numbering: K Means with Silhouette Analysis
- Dominant colour recognition (commented out)
- Zero-Shot Product Detection using CLIP (Contrastive Language-Image Pre-training) model
- Spatial Description through geometrical templating
- Concise Scene Description using ChatGPT 3.5 Turbo through OpenAI API
- Setup OpenAI API account
export OPENAI_API_KEY=entergeneratedAPIKey
sbatch inference.slurm
Retrieval was initially experimented using Google Colab: https://colab.research.google.com/drive/1HXn3XRod3_6CHOes7aB0bJltz-IJagRP?usp=sharing
sbatch retrival.slurm
(Additional modifications can be made by editing the Python scripts mentioned in the corresponding slurm files.)