We participated in Amazon ML Challenge 2024 with a solution for extracting product attributes from images.
- Aman Prakash (Lead), NIAMT, Ranchi
- Sagnik Pramanik, Heritage Institute of Technology, Kolkata
- Ankit Rai, NIAMT, Ranchi
- Abhinav Sinha, BIT Mesra, Ranchi
Our approach uses the Moondream Vision Language Model (VLM), which processes images from the test.csv
file to extract specific attributes like weight, dimensions, and more.
- Moondream VLM (1.6B parameters) was used for lightweight image-to-text processing.
- Extracted key product attributes using targeted prompts.
- Output cleaning and standardization were done using regex for consistency.
For more details, refer to the main script: main_team_qstart_amazonml.ipynb
.