This is House EXP's Super AI SS3 UWB Classification Code that utilizes Pytorch, Timm, and HuggingFace!
The hackathon consisting of classifying signals from Ultra Wide Band Sensors readings of actions of people into the correct type.
The key in here was that we decided to view the problem not only a signal problem, but as an image classification problem. That allowed us to turn these signals into things that were actually visualizable and easier to classify than real numbers.
We experimented with a variety of data processing and transformations including: wavelet transforms, fourier transforms, etc. However it turned out that the best dataset we could get was using the teacher's original transformation code without any modifications.
Click here for some of the transformations we tried:
After orignally trying CNN models which did not perform very well, I had the idea to try and switch to Vision Transformers. Which yielded very good results (boosting our score from 0.3 to 0.8). Therefore our team then explored other SOTA Vision Transformers model:
We ended up using the MaxViT model because of it's special properties:
- Multi Axis Attention
- Both block + grid attention This meant that the model will be able capture the sequentialness of the data, leading to better perfomance than normal ViT Models. This hypothesis was validated in our testing.
Implementing the model, however, was also no easy task as it required me to write a seperate training loop to handle all the additional logic. However this custom training loop allowed us to incorporate more techniques than we have ever before.
In order to boost our score even futher we leveraged the following techniques:
- Gradient Accumulation: Simulating higher batch sizes without all the memory overhead, allowing the model to converge better.
- Lookahead Optimization: Using this relatively technique allowed us to keep two copies of the optimizer (fast and slow) and in theory would have made the model better.
- Cross Validation: This is crucial as our dataset was very small, and conducting cross-validation would mean that our model would be able to see all the data.
- Pseudo Labeling: Using the confident predictions from the model in order to predict the test data and reuse that as additional train data.
- Weighted Ensembling: We used this in order to combine the predictions from each model within the cross validation.
- Voting Ensembling: We then used this in order to combine the predictions from multiple different models.
- Parin: ViT and Timm Training Scripts + Techniques
- Prae: Hybrid Model Research + Experiments
- Champ: Consulting + Experiments
- Film: Initial Cross Validation Script + Experiments
- Ten: Model Tuning + Experiments
- Senmee: Data Preprocessing Scripts + Code Refactoring
- Pond: Data Preprocessing Scripts
- Boss: Data Preprocessing Scripts
- Tae: Data Preprocessing Scripts
- EXP House: Relentless support, encouragement, and determination
Our example codes can be accessed via jupyter notebook files in this repository
- SHARING_SignalPreparation.ipynb for Data Preprocessing Script
- SHARING_Timm_Training_For_UWB_Classification.ipynb for TIMM Training and Interesting Techniques