This repository contains code to train and evaluate a Vision Transformer (ViT) model using PyTorch on the CIFAR-10 dataset. CIFAR-10 is a popular benchmark dataset for image classification tasks.
This project demonstrates how to use a Vision Transformer (ViT) model, originally designed for natural language processing tasks, for image classification. The model is trained on the CIFAR-10 dataset, which consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class.
- Python 3
- PyTorch
- torchvision
- CUDA-enabled GPU (optional but recommended for faster training)
-
Clone the repository:
git clone https://github.com/your_username/your_repository.git cd your_repository