This project is registered as a open source project in the Android Club Winter of Code (AcWoc) 24' event conducted by the Android Club of VIT Bhopal University.
The Caption Generator project is an open-source tool that uses image recognition and caption generation to provide meaningful descriptions for images. It features a user-friendly interface built with Gradio, allowing users to upload an image and receive an auto-generated caption.
This project is written in Python and is open for contributions.
- Image Content Recognition: Detects objects and content within an image.
- Caption Generation: Provides descriptive captions for uploaded images.
- Gradio Interface: An intuitive web-based UI for easy interaction.
- Customizable: Modular architecture to extend and adapt functionality.
- Python
- Gradio for the user interface
- TensorFlow/PyTorch for the backend models
- OpenCV for image processing (if applicable)
- The image is uploaded via the Gradio interface.
- The backend processes the image and generates a caption using a pretrained model.
- The caption is displayed in the Gradio interface.
- Uses pretrained models for image captioning (e.g., ResNet + Transformer architecture).
- Open for customization with other models.
- Clone the repository:
git clone https://github.com/Auth0r-C0dez/Caption_generator.git cd Caption_generator
- Install the dependencies through requirements.txt:
pip install -r requirements.txt
- Start the application:
python app.py
- Open the URL provided by Gradio in your web browser.
- Upload an image and view the generated caption.
"A group of people sitting around a table with laptops."
The project allows customization to improve its functionality:
- Dataset and Keywords: The dataset and the manually added list of keywords for image orientation and recognition can be modified or expanded to enhance the system's performance and accuracy.
- Caption Orientation: The logic for caption generation can be optimized further to provide more coherent and contextually appropriate captions.
- Model Improvement : Make the model better thus increasing the contents of the caption or qualitative arrangement of words in the caption
- Personal improvement : In case you have an idea of improving the model that doesnot fall in the above requirements do create a PR it would be lovely to see that.
Contributors are encouraged to experiment with these aspects and propose improvements.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix:
git checkout -b feature-name
- Make your changes and commit them:
git commit -m "Add a detailed description of the change"
- Push your changes:
git push origin feature-name
- Create a pull request on the main repository.
- Follow the PEP8 coding standards.
- Document your code where necessary.
- Include tests for new features or bug fixes.
- Report issues via the GitHub Issue Tracker.
- For further assistance, contact the maintainer at [[email protected]].
Q: What types of images are supported? A: Standard image formats like PNG, JPEG, and BMP are supported.
Q: Can I use my own model? A: Yes, the project is modular and allows integration of custom models.
Created by Auth0r-C0dez.
This project is licensed under the MIT License.